Title: | Analysis of Geostatistical Count Data using Gaussian Copulas |
---|---|
Description: | Provides a variety of functions to analyze and model geostatistical count data with Gaussian copulas, including 1) data simulation and visualization; 2) correlation structure assessment (here also known as the Normal To Anything); 3) calculate multivariate normal rectangle probabilities; 4) likelihood inference and parallel prediction at predictive locations. Description of the method is available from: Han and DeOliveira (2018) <doi:10.18637/jss.v087.i13>. |
Authors: | Zifei Han |
Maintainer: | Zifei Han <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.8 |
Built: | 2024-12-09 06:39:34 UTC |
Source: | CRAN |
This dataset was studied by Johnson and Hoeting (2011) for analyzing pollution tolerance in Mid-Atlantic Highlands Fish. Pollution intolerant fish were sampled, and several stream characteristics were measured to assess water quality at 119 sites in the Mid-Atlantic region of the United States. All covariates of the data had been standardized to have mean 0 and variance 1.
data(AtlanticFish)
data(AtlanticFish)
A data frame with 119 observations and 12 variables.
LON
Longitude of the location.
LAT
Latitude of the location.
ABUND
Fish abundance at given locations, the discrete response.
ORDER
Strahler stream order, a natural covariate measuring stream size.
DISTOT
Watershed classified as disturbed by human activity, a variable reflecting stream quality.
HAB
An index of fish habitat quality at the stream site, a variable reflecting stream quality.
WSA
Watershed area, a natural covariate.
ELEV
Elevation.
RD
Road density in the watershed, a variable reflecting stream quality.
DO
Concentration of dissolved oxygen in the stream at the sampling site, a stream quality variable.
XFC
Percent of areal fish cover at the sampling site, a stream quality variable.
PCT
Percent of sand in streambed substrate, a stream quality variable.
Johnson, D. and Hoeting, J. (2011) Bayesian Multimodel Inference for Geostatistical Regression Models, PLOS ONE, 6:e25677. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0025677.
data(AtlanticFish) str(AtlanticFish)
data(AtlanticFish) str(AtlanticFish)
marginal.gc
The Beta marginal used for simulation and computing correlation in the
trans-Gaussian random field in function simgc
and corrTG
of the package gcKrig
.
It cannot be used in function mlegc
nor predgc
to make model inferences.
beta.gc(shape1 = 1, shape2 = 1)
beta.gc(shape1 = 1, shape2 = 1)
shape1 |
non-negative scalar, the shape parameter of the Beta distribution. |
shape2 |
non-negative scalar, another shape parameter of the Beta distribution. |
The Beta distribution with parameters shape1 = a
and shape2 = b
has density
for and
where the boundary values at
x = 0
or x = 1
are defined as by continuity (as limits).
An object of class marginal.gc
representing the marginal component.
Zifei Han [email protected]
marginal.gc
, binomial.gc
, gm.gc
,
gaussian.gc
, negbin.gc
,
poisson.gc
, weibull.gc
, zip.gc
marginal.gc
The binomial marginal parameterized in terms of its size and probability.
By default, this function is used for likelihood inference and spatial prediction in function
mlegc
and predgc
of the package gcKrig
.
When all marginal parameters are given, the function is used for simulation and computing correlation in a
trans-Gaussian random field in function simgc
and corrTG
.
binomial.gc(link = "logit", size = NULL, prob = NULL)
binomial.gc(link = "logit", size = NULL, prob = NULL)
link |
the model link function. |
size |
number of trials (zero or more). |
prob |
probability of success on each trial. |
An object of class marginal.gc
representing the marginal component.
Zifei Han [email protected]
marginal.gc
, beta.gc
,
gm.gc
, gaussian.gc
,
negbin.gc
, poisson.gc
,
weibull.gc
, zip.gc
Class of isotropic correlation functions available in the gcKrig
library.
By default, range parameter is not provided, so
this function is used for likelihood inference and spatial prediction with function mlegc
and predgc
.
Users need to specify if the correlation model includes a nugget effect
nugget = TRUE
or not nugget = FALSE
. For Matern and powered exponential correlation functions, the
shape parameter kappa
is also required from users.
When both range and nugget parameters are given,
the function is used to specify the correlation structure in simulation
with function simgc
in package gcKrig
.
At the moment, the following three correlation functins are implemented:
matern.gc |
the Matern correlation function. |
powerexp.gc |
the powered exponential correlation function. |
spherical.gc |
the spherical correlation function. |
Zifei Han [email protected]
De Oliveira, V. (2013) Hierarchical Poisson models for spatial count data. Journal of Multivariate Analysis,122:393-408.
Han, Z. and De Oliveira, V. (2018) gcKrig: An R Package for the Analysis of Geostatistical Count Data Using Gaussian Copulas. Journal of Statistical Software, 87(13), 1–32. doi:10.18637/jss.v087.i13.
matern.gc
,
powerexp.gc
,
spherical.gc
This function implements two general methods for computing the correlation function in a transformed Gaussian random field.
corrTG(marg1, marg2, corrGauss = 0.5, method = "integral", nrep = 1000, kmax = 10, earlystop = FALSE, epscut = 1e-3)
corrTG(marg1, marg2, corrGauss = 0.5, method = "integral", nrep = 1000, kmax = 10, earlystop = FALSE, epscut = 1e-3)
marg1 |
an object of class |
marg2 |
an object of class |
corrGauss |
the correlation in the Gaussian random field. Should be a scalar between 0 and 1. |
method |
the computation method of calculating correlation in the transformed Gaussian random field. Can be either "integral" or "mc". If use "integral" then a series expansion based on the Hermite Polynomials will be used to approximate the correlation, see De Oliveira (2013) or Han and De Oliveira (2016). If use "mc" then the Monte Carlo method will be used. |
nrep |
the Monte Carlo size in computing the correlation. Only need to be specified if |
kmax |
the maximum number of terms used in the series summation (with Hermite polynomial expansion).
Only need to be specified if |
earlystop |
whether or not to allow the series summation to stop early.
If |
epscut |
a small positive value used to truncate the series. |
If method = "mc"
the output is a scalar between 0 and 1, denoting the correlation of the transformed Gaussian random field. If method = "integral"
the output is a scalar of the correlation in the transformed Gaussian random field, and a list of the values in the series expansion based on the integral with Hermite polynomials.
Zifei Han [email protected]
De Oliveira, V. (2013) Hierarchical Poisson models for spatial count data. Journal of Multivariate Analysis,122:393-408.
Han, Z. and De Oliveira, V. (2016) On the correlation structure of Gaussian copula models for geostatistical count data. Australian and New Zealand Journal of Statistics, 58:47-69.
Han, Z. and De Oliveira, V. (2018) gcKrig: An R Package for the Analysis of Geostatistical Count Data Using Gaussian Copulas. Journal of Statistical Software, 87(13), 1–32. doi:10.18637/jss.v087.i13.
## Not run: corrTG(marg1 = poisson.gc(lambda = 10), marg2 = binomial.gc(size = 1, prob = 0.1), corrGauss = 0.5, method = "integral", kmax = 10, earlystop = TRUE, epscut = 1e-5) set.seed(12345) corrTG(marg1 = poisson.gc(lambda = 10), marg2 = binomial.gc(size = 1, prob = 0.1), corrGauss = 0.5, nrep = 100000, method = "mc") ## End(Not run)
## Not run: corrTG(marg1 = poisson.gc(lambda = 10), marg2 = binomial.gc(size = 1, prob = 0.1), corrGauss = 0.5, method = "integral", kmax = 10, earlystop = TRUE, epscut = 1e-5) set.seed(12345) corrTG(marg1 = poisson.gc(lambda = 10), marg2 = binomial.gc(size = 1, prob = 0.1), corrGauss = 0.5, nrep = 100000, method = "mc") ## End(Not run)
This function implemented the method of computing the Frechet Hoeffding upper bound for discrete marginals described in Nelsen (1987), which can only be applied to discrete marginals. Four commonly used marginal distributions were implemented. The distribution "nb" (negative binomial) and "zip" (zero-inflated Poisson) are parameterized in terms of the mean and overdispersion, see Han and De Oliveira (2016).
FHUBdiscrete(marg1, marg2, mu1, mu2, od1 = 0, od2 = 0, binomial.size1 = 1, binomial.size2 = 1)
FHUBdiscrete(marg1, marg2, mu1, mu2, od1 = 0, od2 = 0, binomial.size1 = 1, binomial.size2 = 1)
marg1 |
name of the first discrete marginal distribution. Should be one of the "poisson", "zip", "nb" or "binomial". |
marg2 |
name of the second discrete marginal distribution. Should be one of the "poisson", "zip", "nb" or "binomial". |
mu1 |
mean of the first marginal distribution. If binomial then it is |
mu2 |
mean of the second marginal distribution. If binomial then it is |
od1 |
the overdispersion parameter of the first marginal. Only used when marginal distribution is either "zip" or "nb". |
od2 |
the overdispersion parameter of the second marginal. Only used when marginal distribution is either "zip" or "nb". |
binomial.size1 |
the size parameter (number of trials) when |
binomial.size2 |
the size parameter (number of trials) when |
A scalar denoting the Frechet Hoeffding upper bound of the two specified marginal.
Zifei Han [email protected]
Nelsen, R. (1987) Discrete bivariate distributions with given marginals and correlation. Communications in Statistics Simulation and Computation, 16:199-208.
Han, Z. and De Oliveira, V. (2016) On the correlation structure of Gaussian copula models for geostatistical count data. Australian and New Zealand Journal of Statistics, 58:47-69.
Han, Z. and De Oliveira, V. (2018) gcKrig: An R Package for the Analysis of Geostatistical Count Data Using Gaussian Copulas. Journal of Statistical Software, 87(13), 1–32. doi:10.18637/jss.v087.i13.
## Not run: FHUBdiscrete(marg1 = 'nb', marg2 = 'zip',mu1 = 10, mu2 = 2, od1 = 2, od2 = 0.2) FHUBdiscrete(marg1 = 'binomial', marg2 = 'zip', mu1 = 10, mu2 = 4, binomial.size1 = 25, od2 = 2) FHUBdiscrete(marg1 = 'binomial', marg2 = 'poisson', mu1 = 0.3, mu2 = 20, binomial.size1 = 1) NBmu = seq(0.01, 30, by = 0.02) fhub <- c() for(i in 1:length(NBmu)){ fhub[i] = FHUBdiscrete(marg1 = 'nb', marg2 = 'nb',mu1 = 10, mu2 = NBmu[i], od1 = 0.2, od2 = 0.2) } plot(NBmu, fhub, type='l') ## End(Not run)
## Not run: FHUBdiscrete(marg1 = 'nb', marg2 = 'zip',mu1 = 10, mu2 = 2, od1 = 2, od2 = 0.2) FHUBdiscrete(marg1 = 'binomial', marg2 = 'zip', mu1 = 10, mu2 = 4, binomial.size1 = 25, od2 = 2) FHUBdiscrete(marg1 = 'binomial', marg2 = 'poisson', mu1 = 0.3, mu2 = 20, binomial.size1 = 1) NBmu = seq(0.01, 30, by = 0.02) fhub <- c() for(i in 1:length(NBmu)){ fhub[i] = FHUBdiscrete(marg1 = 'nb', marg2 = 'nb',mu1 = 10, mu2 = NBmu[i], od1 = 0.2, od2 = 0.2) } plot(NBmu, fhub, type='l') ## End(Not run)
marginal.gc
The Gaussian marginal used for simulation and computing correlation in the
trans-Gaussian random field in function simgc
and corrTG
of the package gcKrig
.
It cannot be used in function mlegc
nor predgc
to make model inferences.
gaussian.gc(mean = 0, sd = 1)
gaussian.gc(mean = 0, sd = 1)
mean |
the mean of the Gaussian distribution, a scalar. |
sd |
a positive scalar, the standard deviation of the Gaussian distribution. |
An object of class marginal.gc
representing the marginal component.
Zifei Han [email protected]
marginal.gc
, beta.gc
, binomial.gc
,
gm.gc
, negbin.gc
,
poisson.gc
, weibull.gc
, zip.gc
marginal.gc
The Gamma marginal used for simulation and computing correlation in the
trans-Gaussian random field in functions simgc
and corrTG
of the package gcKrig
.
It cannot be used in functions mlegc
nor predgc
to make model inferences.
gm.gc(shape = 1, rate = 1)
gm.gc(shape = 1, rate = 1)
shape |
a non-negative scalar, shape parameter of the Gamma distribution. |
rate |
a non-negative scalar, rate parameter of the Gamma distribution. |
The Gamma distribution with parameters shape = a
and rate = r
has density
for ,
and
.
An object of class marginal.gc
representing the marginal component.
Zifei Han [email protected]
marginal.gc
, beta.gc
, binomial.gc
,
gaussian.gc
, negbin.gc
,
poisson.gc
, weibull.gc
, zip.gc
The data is aggregated from the dataset lansing
in library spatstat
, which
came from an investigation of a 924 ft x 924 ft (19.6 acres) plot in Lansing Woods, Clinton
County, Michigan USA by D.J. Gerrard.
The original point process data described the locations of 2,251 trees and their
botanical classification (into maples, hickories, black oaks, red oaks, white oaks and miscellaneous trees). The original plot size has been rescaled to the unit square and the number of different types of trees
has been counted within squares of length 1/16.
data(LansingTrees)
data(LansingTrees)
A data frame with 256 observations and 8 variables.
Easting
Cartesian x-coordinate of the locations.
Northing
Cartesian y-coordinate of the locations.
maple
Number of maples in the area.
hickory
Number of hickories in the area.
blackoak
Number of black oaks in the area.
redoak
Number of red oaks in the area.
whiteoak
Number of white oaks in the area.
misc
Number of miscellaneous trees in the area.
Kazianka, H. (2013) Approximate copula-based estimation and prediction of discrete spatial data. Stoch Environ Res Risk Assess, 27:2015-2026
data(LansingTrees) str(LansingTrees)
data(LansingTrees) str(LansingTrees)
Class of marginals available in gcKrig
library for geostatistical data simulation,
correlation structure assessment (both continuous and discrete marginals)
and model inferences (discrete marginals only).
In former cases parameters of
the marginals are given by users, otherwise
parameters are estimated from the data (except when doing prediction
with function predgc
,
users can choose to either input known estimates or
estimate the parameters with input data).
The package gcKrig
does not include inference and prediction functionalities
for continuous marginals. For inference with continuous marginals, see Masarotto and Varin (2012).
By default, when the marginals are discrete, they are used for
estimation with function mlegc
and prediction with
function predgc
.
They can be used in function simgc
and corrTG
as well for
the purpose of data simulation and correlation computation in a transformed
Gaussian random field (Han and De Oliveira, 2016), if parameter values are specified.
For continuous marginals, they are used for simulation with function
simgc
and correlation computation with corrTG
only, so
parameters should always be specified.
At the moment, the following marginals are implemented:
beta.gc |
beta marginals. |
binomial.gc |
binomial marginals. |
gm.gc |
gamma marginals. |
gaussian.gc |
Gaussian marginals. |
negbin.gc |
negative binomial marginals. |
poisson.gc |
Poisson marginals. |
weibull.gc |
Weibull marginals. |
zip.gc |
zero-inflated Poisson marginals. |
Zifei Han [email protected]
Cribari-Neto, F. and Zeileis, A. (2010) Beta regression in R. Journal of Statistical Software, 34(2), 1–24. doi:10.18637/jss.v034.i02.
Ferrari, S.L.P. and Cribari-Neto, F. (2004) Beta regression for modeling rates and proportions. Journal of Applied Statistics, 31:799-815.
Han, Z. and De Oliveira, V. (2016) On the correlation structure of Gaussian copula models for geostatistical count data. Australian and New Zealand Journal of Statistics, 58:47-69.
Masarotto, G. and Varin, C. (2012) Gaussian copula marginal regression. Electronic Journal of Statistics, 6:1517-1549.
Masarotto, G. and Varin C. (2017) Gaussian Copula Regression in R. Journal of Statistical Software, 77(8), 1–26. doi:10.18637/jss.v077.i08.
Han, Z. and De Oliveira, V. (2018) gcKrig: An R Package for the Analysis of Geostatistical Count Data Using Gaussian Copulas. Journal of Statistical Software, 87(13), 1–32. doi:10.18637/jss.v087.i13.
beta.gc
,
binomial.gc
,
gm.gc
,
gaussian.gc
,
negbin.gc
,
poisson.gc
,
weibull.gc
,
zip.gc
corr.gc
The Matern correlation function in spatial statistics.
By default, range parameter is not available, so
this function is used for likelihood inference and spatial prediction in function mlegc
and predgc
.
Users need to specify the shape parameter kappa
and if the correlation model includes a nugget effect
nugget = TRUE
or not nugget = FALSE
.
When both range and nugget parameters are given,
the function is used for simulation with function simgc
in package gcKrig
.
matern.gc(range = NULL, kappa = 0.5, nugget = TRUE)
matern.gc(range = NULL, kappa = 0.5, nugget = TRUE)
range |
a non-negative scalar of the range parameter in Matern correlation function. |
kappa |
a non-negative scalar of the shape parameter in the Matern correlation function. The default kappa = 0.5 corresponds to an exponential correlation model. |
nugget |
the nugget effect of the correlation function. If specified, it must be a scalar between 0 and 1. |
The Matern correlation function with a nugget is of the form:
when and
when
. Here
is range parameter,
is the shape parameter and
is the nugget parameter.
denotes the modified Bessel function of the third
kind of order
.
An object of class corr.gc
representing the correlation component.
Zifei Han [email protected]
Diggle, P. and Ribeiro, P.J. (2007) Model-based Geostatistics. Springer.
Computes the maximum likelihood estimates. Two methods are implemented. If
method = 'GHK'
then the maximum simulated likelihood estimates are computed, if
method = 'GQT'
then the maximum surrogate likelihood estimates are computed.
mlegc(y, x = NULL, locs, marginal, corr, effort = 1, longlat = FALSE, distscale = 1, method = "GHK", corrpar0 = NULL, ghkoptions = list(nrep = c(100, 1000), reorder = FALSE, seed = 12345))
mlegc(y, x = NULL, locs, marginal, corr, effort = 1, longlat = FALSE, distscale = 1, method = "GHK", corrpar0 = NULL, ghkoptions = list(nrep = c(100, 1000), reorder = FALSE, seed = 12345))
y |
a non-negative integer vector of response with its length equals to the number of sampling locations. |
x |
a numeric matrix or data frame of covariates,
with its number of rows equals to the number of sampling locations.
If no covariates then |
locs |
a numeric matrix or data frame of n-D points with row denoting points. The first column is x or longitude, the second column is y or latitude. The number of locations is equal to the number of rows. |
marginal |
an object of class |
corr |
an object of class |
effort |
the sampling effort. For binomial marginal it is the size parameter (number of trials). See details. |
longlat |
if FALSE, use Euclidean distance, if TRUE use great circle distance. The default is FALSE. |
distscale |
a numeric scaling factor for computing distance. If original distance is in kilometers, then
|
method |
two methods are implemented. If
|
corrpar0 |
the starting value of correlation parameter in the optimization procedure.
If |
ghkoptions |
a list of three elements that only need to be specified if
|
This program implemented one simulated likelihood method via sequential importance
sampling (see Masarotto and Varin 2012), which is same as the method implemented in package
gcmr
(Masarotto and Varin 2016) except an antithetic variable is used. It also implemented
one surrogate likelihood method via distributional transform (see Kazianka and Pilz 2010), which is
generally faster.
The argument effort
is the sampling effort (known). It can be used to consider the heterogeneity
of the measurement time or area at different locations.
The default is 1 for all locations. See Han and De Oliveira (2016) for more details.
A list of class "mlegc" with the following elements:
MLE |
the maximum likelihood estimate. |
x |
the design matrix. |
nug |
1 if |
nreg |
number of regression parameters. |
log.lik |
the value of the maximum log-likelihood. |
AIC |
the Akaike information criterion. |
AICc |
the AICc information criterion; essentially AIC with a greater penalty for extra parameters. |
BIC |
the Bayesian information criterion. |
kmarg |
number of marginal parameters. |
par.df |
number of parameters. |
N |
number of observations. |
D |
the distance matrix. |
optlb |
lower bound in optimization. |
optub |
upper bound in optimization. |
hessian |
the hessian matrix evaluated at the final estimates. |
args |
arguments passed in function evaluation. |
Zifei Han [email protected]
Han, Z. and De Oliveira, V. (2016) On the correlation structure of Gaussian copula models for geostatistical count data. Australian and New Zealand Journal of Statistics, 58:47-69.
Kazianka, H. and Pilz, J. (2010) Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch Environ Res Risk Assess 24:661-673.
Masarotto, G. and Varin, C. (2012) Gaussian copula marginal regression. Electronic Journal of Statistics 6:1517-1549. https://projecteuclid.org/euclid.ejs/1346421603.
Masarotto, G. and Varin C. (2017). Gaussian Copula Regression in R. Journal of Statistical Software, 77(8), 1–26. doi:10.18637/jss.v077.i08.
Han, Z. and De Oliveira, V. (2018) gcKrig: An R Package for the Analysis of Geostatistical Count Data Using Gaussian Copulas. Journal of Statistical Software, 87(13), 1–32. doi:10.18637/jss.v087.i13.
## Not run: ## Fit a Simulated Dataset with 100 locations grid <- seq(0.05, 0.95, by = 0.1) xloc <- expand.grid(x = grid, y = grid)[,1] yloc <- expand.grid(x = grid, y = grid)[,2] set.seed(123) simData1 <- simgc(locs = cbind(xloc,yloc), sim.n = 1, marginal = negbin.gc(mu = exp(1+xloc), od = 1), corr = matern.gc(range = 0.4, kappa = 0.5, nugget = 0)) simFit1 <- mlegc(y = simData1$data, x = xloc, locs = cbind(xloc,yloc), marginal = negbin.gc(link = 'log'), corr = matern.gc(kappa = 0.5, nugget = FALSE), method = 'GHK') simFit2 <- mlegc(y = simData1$data, x = xloc, locs = cbind(xloc,yloc), marginal = negbin.gc(link = 'log'), corr = matern.gc(kappa = 0.5, nugget = FALSE), method = 'GQT') #summary(simFit1);summary(simFit2) #plot(simFit1);plot(simFit2) ## Time consuming examples ## Fit a real dataset with 70 sampling locations. data(Weed95) weedobs <- Weed95[Weed95$dummy==1, ] weedpred <- Weed95[Weed95$dummy==0, ] Weedfit1 <- mlegc(y = weedobs$weedcount, x = weedobs[,4:5], locs = weedobs[,1:2], marginal = poisson.gc(link='log'), corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK') summary(Weedfit1) plot(Weedfit1) ## Fit a real dataset with 256 locations data(LansingTrees) Treefit1 <- mlegc(y = LansingTrees[,3], x = LansingTrees[,4], locs = LansingTrees[,1:2], marginal = negbin.gc(link = 'log'), corr = matern.gc(kappa = 0.5, nugget = FALSE), method = 'GHK') summary(Treefit1) plot(Treefit1) # Try to use GQT method Treefit2<- mlegc(y = LansingTrees[,3], x = LansingTrees[,4], locs = LansingTrees[,1:2], marginal = poisson.gc(link='log'), corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GQT') summary(Treefit2) plot(Treefit2) ## Fit a real dataset with randomized locations data(AtlanticFish) Fitfish <- mlegc(y = AtlanticFish[,3], x = AtlanticFish[,4:6], locs = AtlanticFish[,1:2], longlat = TRUE, marginal = negbin.gc(link='log'), corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK') summary(Fitfish) ## Fit a real dataset with binomial counts; see Masarotto and Varin (2016). library(gcmr) data(malaria) malariax <- data.frame(netuse = malaria$netuse, green = malaria$green/100, phc = malaria$phc) Fitmalaria <- mlegc(y = malaria$cases, x = malariax, locs = malaria[,1:2], marginal = binomial.gc(link='logit'), corrpar0 = 1.5, corr = matern.gc(kappa = 0.5, nugget = FALSE), distscale = 0.001, effort = malaria$size, method = 'GHK') summary(Fitmalaria) ## Fit a real spatial binary dataset with 333 locations using probit link data(OilWell) Oilest1 <- mlegc(y = OilWell[,3], x = NULL, locs = OilWell[,1:2], marginal = binomial.gc(link = 'probit'), corr = matern.gc(nugget = TRUE), method = 'GHK') summary(Oilest1) plot(Oilest1, col = 2) ## End(Not run)
## Not run: ## Fit a Simulated Dataset with 100 locations grid <- seq(0.05, 0.95, by = 0.1) xloc <- expand.grid(x = grid, y = grid)[,1] yloc <- expand.grid(x = grid, y = grid)[,2] set.seed(123) simData1 <- simgc(locs = cbind(xloc,yloc), sim.n = 1, marginal = negbin.gc(mu = exp(1+xloc), od = 1), corr = matern.gc(range = 0.4, kappa = 0.5, nugget = 0)) simFit1 <- mlegc(y = simData1$data, x = xloc, locs = cbind(xloc,yloc), marginal = negbin.gc(link = 'log'), corr = matern.gc(kappa = 0.5, nugget = FALSE), method = 'GHK') simFit2 <- mlegc(y = simData1$data, x = xloc, locs = cbind(xloc,yloc), marginal = negbin.gc(link = 'log'), corr = matern.gc(kappa = 0.5, nugget = FALSE), method = 'GQT') #summary(simFit1);summary(simFit2) #plot(simFit1);plot(simFit2) ## Time consuming examples ## Fit a real dataset with 70 sampling locations. data(Weed95) weedobs <- Weed95[Weed95$dummy==1, ] weedpred <- Weed95[Weed95$dummy==0, ] Weedfit1 <- mlegc(y = weedobs$weedcount, x = weedobs[,4:5], locs = weedobs[,1:2], marginal = poisson.gc(link='log'), corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK') summary(Weedfit1) plot(Weedfit1) ## Fit a real dataset with 256 locations data(LansingTrees) Treefit1 <- mlegc(y = LansingTrees[,3], x = LansingTrees[,4], locs = LansingTrees[,1:2], marginal = negbin.gc(link = 'log'), corr = matern.gc(kappa = 0.5, nugget = FALSE), method = 'GHK') summary(Treefit1) plot(Treefit1) # Try to use GQT method Treefit2<- mlegc(y = LansingTrees[,3], x = LansingTrees[,4], locs = LansingTrees[,1:2], marginal = poisson.gc(link='log'), corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GQT') summary(Treefit2) plot(Treefit2) ## Fit a real dataset with randomized locations data(AtlanticFish) Fitfish <- mlegc(y = AtlanticFish[,3], x = AtlanticFish[,4:6], locs = AtlanticFish[,1:2], longlat = TRUE, marginal = negbin.gc(link='log'), corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK') summary(Fitfish) ## Fit a real dataset with binomial counts; see Masarotto and Varin (2016). library(gcmr) data(malaria) malariax <- data.frame(netuse = malaria$netuse, green = malaria$green/100, phc = malaria$phc) Fitmalaria <- mlegc(y = malaria$cases, x = malariax, locs = malaria[,1:2], marginal = binomial.gc(link='logit'), corrpar0 = 1.5, corr = matern.gc(kappa = 0.5, nugget = FALSE), distscale = 0.001, effort = malaria$size, method = 'GHK') summary(Fitmalaria) ## Fit a real spatial binary dataset with 333 locations using probit link data(OilWell) Oilest1 <- mlegc(y = OilWell[,3], x = NULL, locs = OilWell[,1:2], marginal = binomial.gc(link = 'probit'), corr = matern.gc(nugget = TRUE), method = 'GHK') summary(Oilest1) plot(Oilest1, col = 2) ## End(Not run)
Computes the multivariate normal rectangle probability for arbitrary limits and covariance matrices using (reordered) sequential importance sampling.
mvnintGHK(mean, sigma, lower, upper, nrep = 5000, log = TRUE, reorder = TRUE)
mvnintGHK(mean, sigma, lower, upper, nrep = 5000, log = TRUE, reorder = TRUE)
mean |
the numeric vector of mean of length |
sigma |
the covariance matrix of dimension |
lower |
the numeric vector of lower limits of length |
upper |
the numeric vector of upper limits of length |
nrep |
a positive integer of Monte Carlo size. |
log |
if TRUE then return the log of the probability. If FALSE return the probability. |
reorder |
if TRUE then variable reordering algorithm is applied. If FALSE then original ordering is used. |
This program implemented the Geweke-Hajivassiliou-Keane simulator of computing the multivariate normal rectangle probability. For more details see Keane (1994). Also a variable reordering algorithm in Gibson, etal (1994) was implemented.
Note that both -Inf
and Inf
may be specified in lower
and upper
.
A list of the following two components:
value |
the value of the integral. If |
error |
the Monte Carlo standard deviation. |
Zifei Han [email protected]
Gibson GJ., Glasbey CA. and Elston DA. (1994) Monte Carlo evaluation of multivariate normal integrals and sensitivity to variate ordering. Advances in Numerical Methods and Applications, World Scientific Publishing, River Edge.
Keane, M. (1994) A computationally practical simulation estimator for panel data. Econometrica, 62:95-116.
mvnintGHK(mean = rep(0, 51), sigma = diag(0.2, 51) + matrix(0.8, 51, 51), lower = rep(-2,51), upper = rep(2,51), nrep = 10000)
mvnintGHK(mean = rep(0, 51), sigma = diag(0.2, 51) + matrix(0.8, 51, 51), lower = rep(-2,51), upper = rep(2,51), nrep = 10000)
marginal.gc
The negative binomial marginal parameterized in terms of its mean and overdispersion.
By default, this function is used for likelihood inference and spatial prediction in functions
mlegc
and predgc
of the package gcKrig
.
When all marginal parameters are given, the function is used for simulation and computing correlation in a
trans-Gaussian random field in functions simgc
and corrTG
.
negbin.gc(link = "log", mu = NULL, od = NULL)
negbin.gc(link = "log", mu = NULL, od = NULL)
link |
the model link function. |
mu |
a non-negative scalar of the mean parameter. |
od |
a non-negative scalar of the overdispersion parameter. |
The negative binomial distribution with parameters mu = a
and od = 1/b
has density
which is called NB2 by Cameron and Trivedi (2013).
Under this parameterization, , where
is the mean parameter and
is the overdispersion parameter.
For more details see Han and De Oliveira (2016).
An object of class marginal.gc
representing the marginal component.
Zifei Han [email protected]
Cameron,A.C. and Trivedi,P.K. (2013) Regression Analysis of Count Data. Cambridge University Press, 2nd Edition.
Han, Z. and De Oliveira, V. (2016) On the correlation structure of Gaussian copula models for geostatistical count data. Australian and New Zealand Journal of Statistics, 58:47-69.
marginal.gc
, beta.gc
,
binomial.gc
, gm.gc
,
gaussian.gc
, poisson.gc
,
weibull.gc
, zip.gc
A dataset recording locations of successful and unsuccessful drilling oil wells in the northwest shelf of Delaware basin in New Mexico, a region that is densely drilled but has some sparsely drilled areas. The original dataset was transformed to a central area of about 65 square kilometers, see Hohn (1999), Chapter 6.
data(OilWell)
data(OilWell)
A data frame with 333 observations and 3 variables.
Easting
Cartesian x-coordinate of the locations.
Northing
Cartesian y-coordinate of the locations.
Success
A binary variable indicating the success of the drill at given locations. 1 for successful drilling oil wells and 0 for unsuccess.
Hohn, M. (1999) Geostatistics and petroleum geology, Second Edition. Springer Science & Business Media, APA
data(OilWell) str(OilWell)
data(OilWell) str(OilWell)
Four plots can be generated: the 2-D level plot or 3-D scatterplot with the number of counts or fitted values.
## S3 method for class 'mlegc' plot(x, plotdata = "Observed", plottype = "2D", xlab = "xloc", ylab = "yloc", xlim = NULL, ylim = NULL, pch = 20, textcex = 0.8, plotcex = 1, angle = 60, col = 4, col.regions = gray(90:0/100),...)
## S3 method for class 'mlegc' plot(x, plotdata = "Observed", plottype = "2D", xlab = "xloc", ylab = "yloc", xlim = NULL, ylim = NULL, pch = 20, textcex = 0.8, plotcex = 1, angle = 60, col = 4, col.regions = gray(90:0/100),...)
x |
an object of class |
plotdata |
the data to be plotted. Can be either "Observed" if the original counts are used or "Fitted" if the fitted mean at different locations are used. |
plottype |
the type of the plot. Can be either "2D" for 2-D level plot or "3D" for 3-D scatterplot. |
xlab , ylab
|
a title for the x and y axis. |
xlim , ylim
|
numeric vectors of length 2, giving the x and y coordinates ranges.
if they equal to |
pch |
plotting character, i.e., the symbol to use in the 3-D scatter plot. |
textcex |
a numerical value giving the amount by which plotting text should be magnified relative to the default. |
plotcex |
a numerical value giving the amount by which plotting symbols should be magnified relative to the default. |
angle |
angle between x and y axis. |
col |
color of the text. |
col.regions |
color vector to be used reflecting magnitude of the dataset at different locations. The general idea is that this should be a color vector of gradually varying color. |
... |
further arguments passed to plot and panel settings. |
Zifei Han [email protected]
Five plots can be generated. A level plot with the number of counts at both observed and prediction locations; a level plot with predicted means (intensity); a level plot with the predicted counts; a level plot with estimated variances of the prediction; a 3-D scatter plot with both observed and predicted counts.
## S3 method for class 'predgc' plot(x, plottype = "2D", xlab = "xloc", ylab = "yloc", xlim = NULL, ylim = NULL, pch = 20, textcex = 0.6, plotcex = 1, angle = 60, col = c(2, 4), col.regions = gray(90:0/100),...)
## S3 method for class 'predgc' plot(x, plottype = "2D", xlab = "xloc", ylab = "yloc", xlim = NULL, ylim = NULL, pch = 20, textcex = 0.6, plotcex = 1, angle = 60, col = c(2, 4), col.regions = gray(90:0/100),...)
x |
an object of class |
plottype |
can be one of the following: "2D", "Predicted Counts", "Predicted Mean", "Predicted Variance" or "3D". Default is "2D" which generates a 2-D contour plot with both observed and predicted counts. With arguments "Predicted Counts", "Predicted Mean" and "Predicted Variance", a 2-D level plot will be generated with the corresponding data. When "3D" is used, a 3-D scatter plot will be displayed with observed and predicted counts. |
xlab , ylab
|
a title for the x and y axis. |
xlim , ylim
|
numeric vectors of length 2, giving the x and y coordinates ranges.
if they equal to |
pch |
plotting character, i.e., symbol to use in the 3-D scatter plot. |
textcex |
a numerical value giving the amount by which plotting text should be magnified relative to the default. |
plotcex |
a numerical value giving the amount by which plotting symbols should be magnified relative to the default. |
angle |
angle between x and y axis. |
col |
a numeric vector of length 2 indicating color of the plot at sampling and prediction locations. |
col.regions |
color vector to be used reflecting magnitude of the dataset at different locations. The general idea is that this should be a color vector of gradually varying color. |
... |
further arguments passed to plot and panel settings. |
Zifei Han [email protected]
plot.simgc
,
plot.mlegc
,
mlegc
,
predgc
Three plots will be generated. A level plot with the number of counts at given locations; a level plot with point referenced locations and varying colors and a 3-D scatter plot.
## S3 method for class 'simgc' plot(x, index = 1, plottype = "Text", xlab = "xloc", ylab = "yloc", xlim = NULL, ylim = NULL, pch = 20, textcex = 0.8, plotcex = 1, angle = 60, col = 4, col.regions = gray(90:0/100),...)
## S3 method for class 'simgc' plot(x, index = 1, plottype = "Text", xlab = "xloc", ylab = "yloc", xlim = NULL, ylim = NULL, pch = 20, textcex = 0.8, plotcex = 1, angle = 60, col = 4, col.regions = gray(90:0/100),...)
x |
an object of class |
index |
the index of the simulated data, need to be specified since |
plottype |
the type of the printed plot, can be "Text", "Dot", or "3D". When plottype = "Text", a 2-D plot is generated with exact counts at observed locations. When plottype = "Dot", a 2-D dot plot is generated and when plottype = "3D" a 3-D scatter plot is printed. |
xlab , ylab
|
a title for the x and y axis. |
xlim , ylim
|
numeric vectors of length 2, giving the x and y coordinates ranges.
if they equal to |
pch |
plotting character, i.e., symbol to use in the 3-D scatter plot. |
textcex |
a numerical value giving the amount by which plotting text should be magnified relative to the default. |
plotcex |
a numerical value giving the amount by which plotting symbols should be magnified relative to the default. |
angle |
angle between x and y axis. |
col |
color of the text. |
col.regions |
color vector to be used reflecting magnitude of the dataset at different locations. The general idea is that this should be a color vector of gradually varying color. |
... |
further arguments passed to plot and panel settings. |
Zifei Han [email protected]
This funtion generates two plots describing a geostatistical count data. The first plot is a bubble plot with size proportional to the response. The second plot is a lattice plot with text describing the number of counts.
plotgc(data = NULL, locs = NULL, bdry = NULL, col = 2, pch = 1, textcex = 1, col.regions = gray(90:0/100), size=c(0.3, 2.7), ...)
plotgc(data = NULL, locs = NULL, bdry = NULL, col = 2, pch = 1, textcex = 1, col.regions = gray(90:0/100), size=c(0.3, 2.7), ...)
data |
the geostatistical count response. |
locs |
a n by 2 matrix or data frame that indicates the coordinates of locations. |
bdry |
a list containing the coordinates of boundaries. |
col |
the color used for response variable in both plots. |
pch |
the shape used for response variable in the first plot. |
textcex |
a numerical value giving the amount by which plotting text should be magnified relative to the default. |
col.regions |
color vector to be used reflecting magnitude of the dataset at different locations. The general idea is that this should be a color vector of gradually varying color. |
size |
the minimum and maximum of the sizes in the first plot. |
... |
other parameters that control the plotting. |
Zifei Han [email protected]
plot.simgc
plot.mlegc
,
plot.predgc
marginal.gc
The Poisson marginal parameterized in terms of its mean.
By default, this function is used for likelihood inference and spatial prediction in function
mlegc
and predgc
of the package gcKrig
.
When all marginal parameters are given, the function is used for simulation and computing correlation in a
trans-Gaussian random field in function simgc
and corrTG
.
poisson.gc(link = "log", lambda = NULL)
poisson.gc(link = "log", lambda = NULL)
link |
the model link function. |
lambda |
a non-negative scalar of the mean parameter. |
An object of class marginal.gc
representing the marginal component.
Zifei Han [email protected]
marginal.gc
, beta.gc
,
binomial.gc
, gm.gc
,
gaussian.gc
, negbin.gc
,
weibull.gc
, zip.gc
corr.gc
The powered exponential correlation function in spatial statistics.
By default, range parameter is not available, so
this function is used for likelihood inference and spatial prediction in function mlegc
and predgc
.
Users need to specify the shape parameter kappa
and if the correlation model includes a nugget effect
nugget = TRUE
or not nugget = FALSE
.
When both range and nugget parameters are given,
the function is used for simulation with function simgc
in package gcKrig
.
powerexp.gc(range = NULL, kappa = 1, nugget = TRUE)
powerexp.gc(range = NULL, kappa = 1, nugget = TRUE)
range |
a non-negative scalar of the range parameter in powered exponential correlation function. |
kappa |
a scalar between 0 and 2; the value of the shape parameter in the powered exponential correlation function. |
nugget |
the nugget effect of the correlation function. If specified, it must be a scalar between 0 and 1. |
The powered exponential correlation function with a nugget is of the form:
when and
when
.
Here
is distance,
is range parameter,
is the shape parameter and
is the nugget effect.
When using the powered exponential correlation function, note that .
An object of class corr.gc
representing the correlation component.
Zifei Han [email protected]
Computes the plug-in prediction at unobserved sites. Two methods are implemented. If
method = 'GHK'
then the maximum simulated likelihood estimates are computed and
the sequential importance sampling method is used in the integral evaluation. If
method = 'GQT'
then the maximum surrogate likelihood estimates are computed and the
generalized quantile transform method is used in integral approximation.
predgc(obs.y, obs.x = NULL, obs.locs, pred.x = NULL, pred.locs, longlat = FALSE, distscale = 1, marginal, corr, obs.effort = 1, pred.effort = 1, method = "GHK", estpar = NULL, corrpar0 = NULL, pred.interval = NULL, parallel = FALSE, ghkoptions = list(nrep = c(100,1000), reorder = FALSE, seed = 12345), paralleloptions = list(n.cores = 2, cluster.type = "SOCK"))
predgc(obs.y, obs.x = NULL, obs.locs, pred.x = NULL, pred.locs, longlat = FALSE, distscale = 1, marginal, corr, obs.effort = 1, pred.effort = 1, method = "GHK", estpar = NULL, corrpar0 = NULL, pred.interval = NULL, parallel = FALSE, ghkoptions = list(nrep = c(100,1000), reorder = FALSE, seed = 12345), paralleloptions = list(n.cores = 2, cluster.type = "SOCK"))
obs.y |
a non-negative integer vector of observed response with its length equals to the number of observed locations. |
obs.x |
a numeric matrix or data frame of covariates at observed locations,
with its number of rows equals to the number of observed locations.
If no covariates then |
obs.locs |
a numeric matrix or data frame of observed locations.obs.effort The first column is x or longitude, the second column is y or latitude. The number of observed locations is equal to the number of rows. |
pred.x |
a numeric matrix or data frame of covariates at prediction locations,
with its number of rows equals to the number of prediction locations.
If no covariates then |
pred.locs |
a numeric matrix or data frame of prediction locations. First column is x or longitude, second column is y or latitude. The number of prediction locations equals to the number of rows. |
longlat |
if FALSE, use Euclidean distance, if TRUE use great circle distance. The default is FALSE. |
distscale |
a numeric scaling factor for computing distance. If original distance is in kilometers, then
|
marginal |
an object of class |
corr |
an object of class |
obs.effort |
sampling effort at observed locations. For binomial marginal it is the size parameter (number of trials). See details. |
pred.effort |
sampling effort at prediction locations. For binomial marginal it is the size parameter (number of trials). See details. |
method |
two methods are implemented. If
|
estpar |
if |
corrpar0 |
the starting value of correlation parameters in optimization procedure.
If |
pred.interval |
a number between |
parallel |
if |
ghkoptions |
a list of three elements that only need to be specified if
|
paralleloptions |
a list of two elements that only need to be specified if
|
This program implemented two methods in predicting the response at unobserved sites. See mlegc
.
The argument obs.effort
and pred.effort
are the sampling effort (known). It can be used to consider heterogeneity of
the measurement time or area at different locations. The default is 1 for all locations.
See Han and De Oliveira (2016) for more details.
The program computes two types of prediction intervals at a given confidence level. The shortest prediction interval is obtained from evaluating the highest to lowest prediction densities; the equal tail prediction interval has equal tail probabilities.
A list of class "predgc" with the following elements:
obs.locs |
observed locations. |
obs.y |
observed values at observed locations. |
pred.locs |
prediction locations. |
predValue |
the expectation of the conditional predictive distribution. |
predCount |
predicted counts; the closest integer that |
predVar |
estimated variance of the prediction at prediction locations. |
ConfidenceLevel |
confidence level (between 0 to 1) if prediction interval is computed. |
predInterval.EqualTail |
equal-tail prediction interval. |
predInterval.Shortest |
shortest length prediction interval. |
Zifei Han [email protected]
Han, Z. and De Oliveira, V. (2016) On the correlation structure of Gaussian copula models for geostatistical count data. Australian and New Zealand Journal of Statistics, 58:47-69.
Kazianka, H. and Pilz, J. (2010) Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch Environ Res Risk Assess 24:661-673.
Kazianka, H. (2013) Approximate copula-based estimation and prediction of discrete spatial data. Stoch Environ Res Risk Assess 27:2015-2026.
Masarotto, G. and Varin, C. (2012) Gaussian copula marginal regression. Electronic Journal of Statistics 6:1517-1549. https://projecteuclid.org/euclid.ejs/1346421603.
Masarotto, G. and Varin C. (2017). Gaussian Copula Regression in R. Journal of Statistical Software, 77(8), 1–26. doi:10.18637/jss.v077.i08.
Han, Z. and De Oliveira, V. (2018) gcKrig: An R Package for the Analysis of Geostatistical Count Data Using Gaussian Copulas. Journal of Statistical Software, 87(13), 1–32. doi:10.18637/jss.v087.i13.
## Not run: ## For fast check predict at four locations only data(Weed95) weedobs <- Weed95[Weed95$dummy==1, ] weedpred <- Weed95[Weed95$dummy==0, ] predweed1 <- predgc(obs.y = weedobs$weedcount, obs.x = weedobs[,4:5], obs.locs = weedobs[,1:2], pred.x = weedpred[1:4,4:5], pred.locs = weedpred[1:4,1:2], marginal = negbin.gc(link = 'log'), pred.interval = 0.9, corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK') #summary(predweed1) #plot(predweed1) ## Time consuming examples ## Weed prediction at 200 locations using parallel programming predweed2 <- predgc(obs.y = weedobs$weedcount, obs.x = weedobs[,4:5], obs.locs = weedobs[,1:2], pred.x = weedpred[,4:5], pred.locs = weedpred[,1:2], marginal = negbin.gc(link = 'log'), corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK', pred.interval = 0.95, parallel = TRUE, paralleloptions = list(n.cores = 4)) #summary(predweed2) #plot(predweed2) ## A more time consuming example for generating a prediction map at a fine grid data(OilWell) gridstep <- seq(0.5, 30.5, length = 40) locOilpred <- data.frame(Easting = expand.grid(gridstep, gridstep)[,1], Northing = expand.grid(gridstep, gridstep)[,2]) PredOil <- predgc(obs.y = OilWell[,3], obs.locs = OilWell[,1:2], pred.locs = locOilpred, marginal = binomial.gc(link = 'logit'), corr = matern.gc(nugget = FALSE), obs.effort = 1, pred.effort = 1, method = 'GHK', parallel = TRUE, paralleloptions = list(n.cores = 4)) PredMat <- summary(PredOil) ## To generate better prediction maps library(colorspace) filled.contour(seq(0.5,30.5,length=40), seq(0.5,30.5,length=40), matrix(PredMat$predMean,40,), zlim = c(0, 1), col=rev(heat_hcl(12)), nlevels=12, xlab = "Eastings", ylab = "Northings", plot.axes = {axis(1); axis(2); points(OilWell[,1:2], col = 1, cex = 0.25 + 0.25*OilWell[,3])}) filled.contour(seq(0.5,30.5,length=40), seq(0.5,30.5,length=40), matrix(PredMat$predVar,40,), zlim = c(0, 0.3), col = rev(heat_hcl(12)), nlevels = 10, xlab = "Eastings", ylab = "Northings", plot.axes = {axis(1); axis(2); points(OilWell[,1:2], col = 1, cex = 0.25 + 0.25*OilWell[,3])}) ## End(Not run)
## Not run: ## For fast check predict at four locations only data(Weed95) weedobs <- Weed95[Weed95$dummy==1, ] weedpred <- Weed95[Weed95$dummy==0, ] predweed1 <- predgc(obs.y = weedobs$weedcount, obs.x = weedobs[,4:5], obs.locs = weedobs[,1:2], pred.x = weedpred[1:4,4:5], pred.locs = weedpred[1:4,1:2], marginal = negbin.gc(link = 'log'), pred.interval = 0.9, corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK') #summary(predweed1) #plot(predweed1) ## Time consuming examples ## Weed prediction at 200 locations using parallel programming predweed2 <- predgc(obs.y = weedobs$weedcount, obs.x = weedobs[,4:5], obs.locs = weedobs[,1:2], pred.x = weedpred[,4:5], pred.locs = weedpred[,1:2], marginal = negbin.gc(link = 'log'), corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK', pred.interval = 0.95, parallel = TRUE, paralleloptions = list(n.cores = 4)) #summary(predweed2) #plot(predweed2) ## A more time consuming example for generating a prediction map at a fine grid data(OilWell) gridstep <- seq(0.5, 30.5, length = 40) locOilpred <- data.frame(Easting = expand.grid(gridstep, gridstep)[,1], Northing = expand.grid(gridstep, gridstep)[,2]) PredOil <- predgc(obs.y = OilWell[,3], obs.locs = OilWell[,1:2], pred.locs = locOilpred, marginal = binomial.gc(link = 'logit'), corr = matern.gc(nugget = FALSE), obs.effort = 1, pred.effort = 1, method = 'GHK', parallel = TRUE, paralleloptions = list(n.cores = 4)) PredMat <- summary(PredOil) ## To generate better prediction maps library(colorspace) filled.contour(seq(0.5,30.5,length=40), seq(0.5,30.5,length=40), matrix(PredMat$predMean,40,), zlim = c(0, 1), col=rev(heat_hcl(12)), nlevels=12, xlab = "Eastings", ylab = "Northings", plot.axes = {axis(1); axis(2); points(OilWell[,1:2], col = 1, cex = 0.25 + 0.25*OilWell[,3])}) filled.contour(seq(0.5,30.5,length=40), seq(0.5,30.5,length=40), matrix(PredMat$predVar,40,), zlim = c(0, 0.3), col = rev(heat_hcl(12)), nlevels = 10, xlab = "Eastings", ylab = "Northings", plot.axes = {axis(1); axis(2); points(OilWell[,1:2], col = 1, cex = 0.25 + 0.25*OilWell[,3])}) ## End(Not run)
This function computes the (approximate) profile likelihood based confidence interval. The algorithm starts by choosing two starting points at different sides of the MLE and using an iterative process to find the approximate lower and upper bound.
## S3 method for class 'mlegc' profile(fitted, par.index, alpha = 0.05, start.point = NULL, method = 'GQT', nrep = 1000, seed = 12345, ...)
## S3 method for class 'mlegc' profile(fitted, par.index, alpha = 0.05, start.point = NULL, method = 'GQT', nrep = 1000, seed = 12345, ...)
fitted |
an object of class |
par.index |
the index of the parameter which should be profiled. |
alpha |
the significance level, default is |
start.point |
numeric vector of length 2 indicating the starting points for finding the
left and right bound. If |
method |
Two methods are implemented. If
|
nrep |
the Monte Carlo size of the importance sampling algorithm for likelihood approximation;
only need to be specified if |
seed |
seed of the pseudorandom generator used in Monte Carlo simulation;
only need to be specified if |
... |
other arguments passed. |
Lower and upper bounds of the approximate confidence interval.
Zifei Han [email protected]
Masarotto, G. and Varin, C. (2012) Gaussian copula marginal regression. Electronic Journal of Statistics 6:1517-1549. https://projecteuclid.org/euclid.ejs/1346421603.
Masarotto, G. and Varin C. (2017). Gaussian Copula Regression in R. Journal of Statistical Software, 77(8), 1–26. doi:10.18637/jss.v077.i08.
Han, Z. and De Oliveira, V. (2018) gcKrig: An R Package for the Analysis of Geostatistical Count Data Using Gaussian Copulas. Journal of Statistical Software, 87(13), 1–32. doi:10.18637/jss.v087.i13.
## Not run: data(LansingTrees) Treefit4 <- mlegc(y = LansingTrees[,3], x = LansingTrees[,4], locs = LansingTrees[,1:2], marginal = zip.gc(link = 'log'), corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK') summary(Treefit4) profile(Treefit4, 1, 0.05, method = 'GHK', nrep = 1000, seed = 12345) profile(Treefit4, 2, 0.05, method = 'GHK', nrep = 1000, seed = 12345) profile(Treefit4, 3, 0.05, method = 'GHK', nrep = 1000, seed = 12345) profile(Treefit4, 4, 0.05, method = 'GHK', nrep = 1000, seed = 12345) profile(Treefit4, 5, 0.05, method = 'GHK', nrep = 1000, seed = 12345) ## End(Not run)
## Not run: data(LansingTrees) Treefit4 <- mlegc(y = LansingTrees[,3], x = LansingTrees[,4], locs = LansingTrees[,1:2], marginal = zip.gc(link = 'log'), corr = matern.gc(kappa = 0.5, nugget = TRUE), method = 'GHK') summary(Treefit4) profile(Treefit4, 1, 0.05, method = 'GHK', nrep = 1000, seed = 12345) profile(Treefit4, 2, 0.05, method = 'GHK', nrep = 1000, seed = 12345) profile(Treefit4, 3, 0.05, method = 'GHK', nrep = 1000, seed = 12345) profile(Treefit4, 4, 0.05, method = 'GHK', nrep = 1000, seed = 12345) profile(Treefit4, 5, 0.05, method = 'GHK', nrep = 1000, seed = 12345) ## End(Not run)
Simulate geostatistical data from Gaussian copula model at given locations. This function can simulate multiple datasets simultaneously.
simgc(locs, sim.n = 1, marginal, corr, longlat = FALSE)
simgc(locs, sim.n = 1, marginal, corr, longlat = FALSE)
locs |
a numeric matrix or data frame of n-D points with row denoting points. First column is x or longitude, second column is y or latitude. The number of locations is equal to the number of rows. |
sim.n |
the number of simulation samples required. |
marginal |
an object of class |
corr |
an object of class |
longlat |
if FALSE, use Euclidean distance, if TRUE use great circle distance. Default is FALSE. |
A list of two elements:
data |
a numeric matrix with each row denoting a simulated data. |
locs |
the location of the simulated data, same as the input locs. |
Zifei Han [email protected]
grid <- seq(0.05, 0.95, by = 0.1) xloc <- expand.grid(x = grid, y = grid)[,1] yloc <- expand.grid(x = grid, y = grid)[,2] set.seed(12345) sim1 <- simgc(locs = cbind(xloc,yloc), sim.n = 10, marginal = negbin.gc(mu = 5, od = 1), corr = matern.gc(range = 0.3, kappa = 0.5, nugget = 0.1)) #plot(sim1, index = 1)
grid <- seq(0.05, 0.95, by = 0.1) xloc <- expand.grid(x = grid, y = grid)[,1] yloc <- expand.grid(x = grid, y = grid)[,2] set.seed(12345) sim1 <- simgc(locs = cbind(xloc,yloc), sim.n = 10, marginal = negbin.gc(mu = 5, od = 1), corr = matern.gc(range = 0.3, kappa = 0.5, nugget = 0.1)) #plot(sim1, index = 1)
corr.gc
The spherical correlation function in spatial statistics.
By default, range parameter is not available, so
this function is used for likelihood inference and spatial prediction in function mlegc
and predgc
.
Users need to specify if the correlation model includes a nugget effect
nugget = TRUE
or not nugget = FALSE
.
When both range and nugget parameters are given,
the function is used for simulation with function simgc
in package gcKrig
.
spherical.gc(range = NULL, nugget = TRUE)
spherical.gc(range = NULL, nugget = TRUE)
range |
a non-negative scalar of the range parameter in the spherical correlation function. |
nugget |
the nugget effect of the correlation function. If specified, it must be a scalar between 0 and 1. |
The spherical correlation function with a nugget is of the form:
when and
when
,
is distance.
An object of class corr.gc
representing the correlation component.
Zifei Han [email protected]
mlegc
Return a summary table of results from model fitting.
## S3 method for class 'mlegc' summary(object, ...)
## S3 method for class 'mlegc' summary(object, ...)
object |
an object of class |
... |
additional arguments, but currently not used. |
A table summary of the estimates, standard error, z-value and several information criteria.
Zifei Han [email protected]
predgc
Output a summary data frame.
## S3 method for class 'predgc' summary(object,...)
## S3 method for class 'predgc' summary(object,...)
object |
an object of class |
... |
further arguments. |
A table including the following information:
pred.locs |
prediction locations. |
predMean |
the expectation of the conditional predictive distribution. |
predCount |
predicted counts; the closest integer that |
predVar |
estimated variance of the prediction at prediction locations. |
predInterval.EqualTail |
equal-tail prediction interval;
computed only if |
predInterval.Shortest |
shortest length prediction interval;
computed only if |
Zifei Han [email protected]
Calculate covariance and correlation matrix.
## S3 method for class 'mlegc' vcov(object, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'mlegc' vcov(object, digits = max(3, getOption("digits") - 3), ...)
object |
an object of class |
digits |
integer indicating the number of decimal places (round) or significant digits (signif) to be used. |
... |
other arguments passed to |
The estimated variance-covariance matrix and estimated correlation matrix.
Zifei Han [email protected]
The weed species Viola Arvensis was counted within circular frames each of area 0.25 square meter except for 10 missing sites in the first row, from a 20 by 14 rectangular grid, so the total number of locations is 270. Also, the percentages of organic matter in a soil sample are collected. The data was studied by Christensen and Waagepetersen (2002) to investigate whether weed occurrence could be predicted from observations of soil texture and soil chemical properties.
data(Weed95)
data(Weed95)
A data frame with 270 observations and 6 variables.
xloc
Cartesian x-coordinate of the locations (in meter).
yloc
Cartesian y-coordinate of the locations (in meter).
weedcount
Number of weed collected at the given site.
scaled Y coord
The scaled Y coordinate with range -1 to 1 as a covariate in regression.
organic
Another chemical component indicating the organic matter of the soil.
dummy
A dummy variable taking values 0 or 1. If 0 it is treated as observed location and 1 treated as predicted location in Christensen and Waagepetersen (2002).
Christensen,O. and Waagepetersen,R. (2002) Bayesian Prediction of Spatial Count Data Using Generalized Linear Mixed Models. Biometrics, 58:280-286
data(Weed95) str(Weed95)
data(Weed95) str(Weed95)
marginal.gc
The Weibull marginal used for simulation and computing correlation in the
trans-Gaussian random field in function simgc
and corrTG
of the package gcKrig
.
It cannot be used in function mlegc
nor predgc
to make model inferences.
weibull.gc(shape = 1, scale = 1)
weibull.gc(shape = 1, scale = 1)
shape |
a positive scalar of shape parameter in the Weibull distribution. |
scale |
a positive scalar of scale parameter in the Weibull distribution. |
The Weibull distribution with shape
parameter and
scale
parameter
has density given by
An object of class marginal.gc
representing the marginal component.
Zifei Han [email protected]
marginal.gc
, beta.gc
,
binomial.gc
, gm.gc
,
gaussian.gc
, negbin.gc
,
poisson.gc
, zip.gc
marginal.gc
The zero-inflated Poisson marginal parameterized in terms of its mean and overdispersion.
By default, this function is used for likelihood inference and spatial prediction in function
mlegc
and predgc
of the package gcKrig
.
When all marginal parameters are given, the function is used for simulation and computing correlation in a
trans-Gaussian random field in function simgc
and corrTG
.
zip.gc(link = "log", mu = NULL, od = NULL)
zip.gc(link = "log", mu = NULL, od = NULL)
link |
the model link function. |
mu |
a non-negative scalar of the mean parameter. |
od |
a non-negative scalar of the overdispersion parameter. |
The zero-inflated Poisson distribution with parameters mu = a
and od = b
has density
when , and
when
Under this parameterization, , where
is the mean parameter and
is the overdispersion parameter.
For more details see Han and De Oliveira (2016).
An object of class marginal.gc
representing the marginal component.
Zifei Han [email protected]
Han, Z. and De Oliveira, V. (2016) On the correlation structure of Gaussian copula models for geostatistical count data. Australian and New Zealand Journal of Statistics, 58:47-69.
marginal.gc
, beta.gc
, binomial.gc
,
gm.gc
, gaussian.gc
,
negbin.gc
, poisson.gc
,
weibull.gc