Title: | Weakly Associated Vectors (WAVE) Sampling |
---|---|
Description: | Spatial data are generally auto-correlated, meaning that if two units selected are close to each other, then it is likely that they share the same properties. For this reason, when sampling in the population it is often needed that the sample is well spread over space. A new method to draw a sample from a population with spatial coordinates is proposed. This method is called wave (Weakly Associated Vectors) sampling. It uses the less correlated vector to a spatial weights matrix to update the inclusion probabilities vector into a sample. For more details see Raphaël Jauslin and Yves Tillé (2019) <doi:10.1007/s13253-020-00407-1>. |
Authors: | Raphaël Jauslin [aut, cre] , Yves Tillé [aut] |
Maintainer: | Raphaël Jauslin <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.3 |
Built: | 2024-12-16 07:00:25 UTC |
Source: | CRAN |
Calculate the squared Euclidean distance from unit to the other units.
distUnitk(X, k, tore, toreBound)
distUnitk(X, k, tore, toreBound)
X |
matrix representing the spatial coordinates. |
k |
the unit index to be used. |
tore |
an optional logical value, if we are considering the distance on a tore. See Details. |
toreBound |
an optional numeric value that specify the length of the tore. |
Let be the spatial coordinates of the unit
. The classical euclidean distance is given by
When the points are distributed on a regular grid of
.
It is possible to consider the units like they were placed on a tore. It can be illustrated by Pac-Man passing through the wall to get away from ghosts. Specifically,
we could consider two units on the same column (resp. row) that are on the opposite have a small distance,
The option toreBound
specify the length of the tore in the case of .
It is omitted if the
tore
option is equal to FALSE
.
a vector of length that contains the distances from the unit
to all other units.
Raphaël Jauslin [email protected]
N <- 5 x <- seq(1,N,1) X <- as.matrix(expand.grid(x,x)) distUnitk(X,k = 2,tore = TRUE,toreBound = 5) distUnitk(X,k = 2,tore = FALSE,toreBound = -1)
N <- 5 x <- seq(1,N,1) X <- as.matrix(expand.grid(x,x)) distUnitk(X,k = 2,tore = TRUE,toreBound = 5) distUnitk(X,k = 2,tore = FALSE,toreBound = -1)
indexThis function implements the spreading measure based on Moran's index.
IB(W, s)
IB(W, s)
W |
a stratification matrix inheriting from |
s |
a vector of size |
This index is developped by Tillé et al. (2018) and measure the spreading of a sample drawn from a population.
It uses a corrected version of the traditional Moran's index. Each row of the matrix
should represents a stratum. Each
stratum is defined by a particular unit and its neighbouring units. See
wpik
.
The spatial balance measure is equal to
where is the diagonal matrix containing the
,
and
To specify the spatial weights uses the argument W
.
A numeric value that represents the spatial balance. It could be any real value between -1 (spread) and 1 (clustered).
Raphaël Jauslin [email protected]
Tillé, Y., Dickson, M.M., Espa, G., and Guiliani, D. (2018). Measuring the spatial balance of a sample: A new measure based on Moran's I index. Spatial Statistics, 23, 182-192.
N <- 36 n <- 12 x <- seq(1,sqrt(N),1) X <- expand.grid(x,x) pik <- rep(n/N,N) W <- wpik(as.matrix(X),pik,bound = 1,tore = TRUE,shift = FALSE,toreBound = sqrt(N)) W <- W - diag(diag(W)) s <- wave(as.matrix(X),pik,tore = TRUE,shift = TRUE,comment = TRUE) IB(W,s)
N <- 36 n <- 12 x <- seq(1,sqrt(N),1) X <- expand.grid(x,x) pik <- rep(n/N,N) W <- wpik(as.matrix(X),pik,bound = 1,tore = TRUE,shift = FALSE,toreBound = sqrt(N)) W <- W - diag(diag(W)) s <- wave(as.matrix(X),pik,tore = TRUE,shift = TRUE,comment = TRUE) IB(W,s)
to compute the Spatial balanceCalculates the values of the spatial balance developped by Stevens and Olsen (2004) and suggested by Grafström et al. (2012).
sb_vk(pik, X, s)
sb_vk(pik, X, s)
pik |
vector of the inclusion probabilities. The length should be equal to |
X |
matrix representing the spatial coordinates. |
s |
A vector of size |
The spatial balance measure based on the Voronoï polygons is defined by
The function return the values and is mainly based on the function
sb
of the package BalancedSampling
Grafström and Lisic (2019).
A vector of size with elements equal to the
values. If the unit is not selected then the value is equal to 0.
Raphaël Jauslin [email protected]
Grafström, A., Lundström, N.L.P. and Schelin, L. (2012). Spatially balanced sampling through the Pivotal method. Biometrics, 68(2), 514-520
Grafström, A., Lisic J. (2019). BalancedSampling: Balanced and Spatially Balanced Sampling. R package version 1.5.5. https://CRAN.R-project.org/package=BalancedSampling
Stevens, D. L. Jr. and Olsen, A. R. (2004). Spatially balanced sampling of natural resources. Journal of the American Statistical Association 99, 262-278
N <- 50 n <- 10 X <- as.matrix(cbind(runif(N),runif(N))) pik <- rep(n/N,N) s <- wave(X,pik) v <- sb_vk(pik,X,s)
N <- 50 n <- 10 X <- as.matrix(cbind(runif(N),runif(N))) pik <- rep(n/N,N) s <- wave(X,pik) v <- sb_vk(pik,X,s)
Estimator of the variance of the Horvitz-Thompson estimator. It is based on the variance estimator of the conditional Poisson sampling design. See Tillé (2020, Chapter 5) for more informations.
varHAJ(y, pik, s)
varHAJ(y, pik, s)
y |
vector of size |
pik |
vector of the inclusion probabilities. The length should be equal to |
s |
index vector of size |
The function computes the following quantity :
This estimator is well-defined for maximum entropy sampling design and use only inclusion probabilities of order one.
A number, the variance
Tillé, Y. (2020). Sampling and estimation from finite populations. New York: Wiley
Select a spread sample from inclusion probabilities using the weakly associated vectors sampling method.
wave( X, pik, bound = 1, tore = FALSE, shift = FALSE, toreBound = -1, comment = FALSE, fixedSize = TRUE )
wave( X, pik, bound = 1, tore = FALSE, shift = FALSE, toreBound = -1, comment = FALSE, fixedSize = TRUE )
X |
matrix representing the spatial coordinates. |
pik |
vector of the inclusion probabilities. The length should be equal to N. |
bound |
a scalar representing the bound to reach. See Details. Default is 1. |
tore |
an optional logical value, if we are considering the distance on a tore. See Details. Default is |
shift |
an optional logical value, if you would use a shift perturbation. See Details. Default is |
toreBound |
a numeric value that specify the size of the grid. Default is -1. |
comment |
an optional logical value, indicating some informations during the execution. Default is |
fixedSize |
an optional logical value, if you would impose a fixed sample size. Default is |
The main idea is derived from the cube method (Devill and Tillé, 2004). At each step, the inclusion probabilities vector pik
is randomly modified. This modification is carried out in a direction that best preserves the spreading of the sample.
A stratification matrix is computed from the spatial weights matrix calculated from the function
wpik
.
Depending if is full rank or not, the vector giving the direction is not selected in the same way.
If matrix is not full rank, a vector that is contained in the right null space is selected:
If matrix is full rank, we find
,
the singular vectors associated to the
smallest singular value
of
such that
Vector is then centered to ensure fixed sample size. At each step, inclusion probabilities is modified and at least on component is set to 0 or 1. Matrix
is updated
from the new inclusion probabilities. The whole procedure it repeated until it remains only one component that is not equal to 0 or 1.
For more informations on the options tore
and toreBound
, see distUnitk
. If tore
is set up TRUE
and toreBound
not specified the toreBound
is equal to
where is equal to the number of column of the matrix
X
.
For more informations on the option shift
, see wpik
.
If fixedSize
is equal TRUE
, the weakest associated vector is centered at each step of the algorithm. This ensures that the size of the selected sample is equal to the sum of the inclusion probabilities.
A vector of size with elements equal 0 or 1. The value 1 indicates that the unit is selected while the value 0 is for non-chosen unit.
Raphaël Jauslin [email protected]
Deville, J. C. and Tillé, Y. (2004). Efficient balanced sampling: the cube method. Biometrika, 91(4), 893-912
#------------ # Example 2D #------------ N <- 50 n <- 15 pik <- rep(n/N,N) X <- as.matrix(cbind(runif(N),runif(N))) s <- wave(X,pik) #------------ # Example 2D grid #------------ N <- 36 # 6 x 6 grid n <- 12 # number of unit selected x <- seq(1,sqrt(N),1) X <- as.matrix(cbind(rep(x,times = sqrt(N)),rep(x,each = sqrt(N)))) pik <- rep(n/N,N) s <- wave(X,pik, tore = TRUE,shift = FALSE) #------------ # Example 1D #------------ N <- 100 n <- 10 X <- as.matrix(seq(1,N,1)) pik <- rep(n/N,N) s <- wave(X,pik,tore = TRUE,shift =FALSE,comment = TRUE)
#------------ # Example 2D #------------ N <- 50 n <- 15 pik <- rep(n/N,N) X <- as.matrix(cbind(runif(N),runif(N))) s <- wave(X,pik) #------------ # Example 2D grid #------------ N <- 36 # 6 x 6 grid n <- 12 # number of unit selected x <- seq(1,sqrt(N),1) X <- as.matrix(cbind(rep(x,times = sqrt(N)),rep(x,each = sqrt(N)))) pik <- rep(n/N,N) s <- wave(X,pik, tore = TRUE,shift = FALSE) #------------ # Example 1D #------------ N <- 100 n <- 10 X <- as.matrix(seq(1,N,1)) pik <- rep(n/N,N) s <- wave(X,pik,tore = TRUE,shift =FALSE,comment = TRUE)
The stratification matrix is calculated from the inclusion probabilities. It takes the distances between units into account. See Details.
wpik(X, pik, bound = 1, tore = FALSE, shift = FALSE, toreBound = -1)
wpik(X, pik, bound = 1, tore = FALSE, shift = FALSE, toreBound = -1)
X |
matrix representing the spatial coordinates. |
pik |
vector of the inclusion probabilities. The length should be equal to |
bound |
a scalar representing the bound to reach. Default is 1. |
tore |
an optional logical value, if we are considering the distance on a tore. Default is |
shift |
an optional logical value, if you would use a shift perturbation. See Details for more informations. Default is |
toreBound |
a numeric value that specify the size of the grid. Default is -1. |
Entries of the stratification matrix indicates how the units are close from each others. Hence a large value means that the unit
is close to the unit
. This function considers that a unit represents its neighbor till their inclusion probabilities
sum up to
bound
.
We define the set of the nearest neighbor of the unit
including
such that the sum of their inclusion
probabilities is just greater than
bound
. Moreover, let , the number of elements in
.
The matrix
is then defined as follows,
if unit
is in the set of the
nearest neighbor of
.
if unit
is the
nearest neighbour of
.
otherwise.
Hence, the th row of the matrix represents
neighborhood or stratum of the unit such that the inclusion probabilities sum up to 1 and
the
th column the weights that unit
takes for each stratum.
The option shift
add a small normally distributed perturbation rnorm(0,0.01)
to the coordinates
of the centroid of the stratum considered. This could be useful if there are many unit that have the same distances.
Indeed, if two units have the same distance and are the last unit before that the bound is reached, then the weights
of the both units is updated. If a shift perturbation is used then all the distances are differents and only one unit
weight is update such that the bound is reached.
The shift perturbation is generated at the beginning of the procedure such that each stratum is shifted by the same perturbation.
A sparse matrix representing the spatial weights.
Raphaël Jauslin [email protected]
N <- 25 n <- 5 X <- as.matrix(cbind(runif(N),runif(N))) pik <- rep(n/N,N) W <- wpik(X,pik)
N <- 25 n <- 5 X <- as.matrix(cbind(runif(N),runif(N))) pik <- rep(n/N,N) W <- wpik(X,pik)
The stratification matrix is calculated from the inverse inclusion probabilities. It is a direct implementation of the spatial weights specified in Tillé et al., (2018).
wpikInv(X, pik, tore = FALSE, shift = FALSE, toreBound = -1)
wpikInv(X, pik, tore = FALSE, shift = FALSE, toreBound = -1)
X |
matrix representing the spatial coordinates. |
pik |
vector of the inclusion probabilities. The length should be equal to N. |
tore |
an optional logical value, if we are considering the distance on a tore. Default is |
shift |
an optional logical value, if you would use a shift perturbation. See Details for more informations. Default is |
toreBound |
a numeric value that specify the size of the grid. Default is -1. |
Entries of the stratification matrix indicates how the units are close from each others. Hence a large value means that the unit
is close to the unit
. This function considers that if unit
were selected in the sample drawn from the population then
would represent
units in the population and, as a consequence, it would be natural to consider that
has
neighbours in the population. The
neighbours are the nearest neighbours of
according to distances.
The weights are so calculated as follows :
if unit
if unit
is the
nearest neighbour of
.
otherwise.
and
are the inferior and the superior integers of
.
The option shift
add a small normally distributed perturbation rnorm(0,0.01)
to the coordinates
of the centroid of the stratum considered. This could be useful if there are many unit that have the same distances.
Indeed, if two units have the same distance and are the last unit before that the bound is reached, then the weights
of the both units is updated. If a shift perturbation is used then all the distances are differents and only one unit
weight is update such that the bound is reached.
The shift perturbation is generated at the beginning of the procedure such that each stratum is shifted by the same perturbation.
A sparse matrix representing the spatial weights.
Raphaël Jauslin [email protected]
Tillé, Y., Dickson, M.M., Espa, G., and Guiliani, D. (2018). Measuring the spatial balance of a sample: A new measure based on Moran's I index. Spatial Statistics, 23, 182-192.
N <- 25 n <- 5 X <- as.matrix(cbind(runif(N),runif(N))) pik <- rep(n/N,N) W <- wpikInv(X,pik)
N <- 25 n <- 5 X <- as.matrix(cbind(runif(N),runif(N))) pik <- rep(n/N,N) W <- wpikInv(X,pik)