Package 'WaveSampling' reference manual

Title:	Weakly Associated Vectors (WAVE) Sampling
Description:	Spatial data are generally auto-correlated, meaning that if two units selected are close to each other, then it is likely that they share the same properties. For this reason, when sampling in the population it is often needed that the sample is well spread over space. A new method to draw a sample from a population with spatial coordinates is proposed. This method is called wave (Weakly Associated Vectors) sampling. It uses the less correlated vector to a spatial weights matrix to update the inclusion probabilities vector into a sample. For more details see Raphaël Jauslin and Yves Tillé (2019) <doi:10.1007/s13253-020-00407-1>.
Authors:	Raphaël Jauslin [aut, cre] , Yves Tillé [aut]
Maintainer:	Raphaël Jauslin <[email protected]>
License:	GPL (>= 2)
Version:	0.1.4
Built:	2025-03-10 06:43:05 UTC
Source:	CRAN

Squared Euclidean distances of the unit k.

Description

Calculate the squared Euclidean distance from unit $k$ to the other units.

Usage

distUnitk(X, k, tore, toreBound)
distUnitk(X, k, tore, toreBound)

Arguments

`X`	matrix representing the spatial coordinates.
`k`	the unit index to be used.
`tore`	an optional logical value, if we are considering the distance on a tore. See Details.
`toreBound`	an optional numeric value that specify the length of the tore.

Details

Let $\mathbf{x}_k,\mathbf{x}_l$ be the spatial coordinates of the unit $k,l \in U$ . The classical euclidean distance is given by

$d^2(k,l) = (\mathbf{x}_k - \mathbf{x}_l)^\top (\mathbf{x}_k - \mathbf{x}_l).$

When the points are distributed on a $N_1 \times N_2$ regular grid of $R^2$ . It is possible to consider the units like they were placed on a tore. It can be illustrated by Pac-Man passing through the wall to get away from ghosts. Specifically, we could consider two units on the same column (resp. row) that are on the opposite have a small distance,

$d^2_T(k,l) = min( (x_{k_1} - x_{l_1})^2, (x_{k_1} + N_1 - x_{l_1})^2, (x_{k_1} - N_1 - x_{l_1})^2) +$

$min( (x_{k_2} - x_{l_2})^2, (x_{k_2} + N_2 - x_{l_2})^2, (x_{k_2} - N_2 - x_{l_2})^2).$

The option toreBound specify the length of the tore in the case of $N_1 = N_2 = N$ . It is omitted if the tore option is equal to FALSE.

Value

a vector of length $N$ that contains the distances from the unit $k$ to all other units.

Examples

  N <- 5
  x <- seq(1,N,1)
  X <- as.matrix(expand.grid(x,x))
  distUnitk(X,k = 2,tore = TRUE,toreBound = 5)
  distUnitk(X,k = 2,tore = FALSE,toreBound = -1)
N <- 5
  x <- seq(1,N,1)
  X <- as.matrix(expand.grid(x,x))
  distUnitk(X,k = 2,tore = TRUE,toreBound = 5)
  distUnitk(X,k = 2,tore = FALSE,toreBound = -1)

Spreading measure based on Moran's $I$ index

Description

This function implements the spreading measure based on Moran's $I$ index.

Usage

IB(W, s)
IB(W, s)

Arguments

`W`	a stratification matrix inheriting from `sparseMatrix` that represents the spatial weights. See `wpik`.
`s`	a vector of size $N$ with elements equal to 0 or 1. The value 1 indicates that the unit is selected while the value 0 is for non-chosen units.

Details

This index is developped by Tillé et al. (2018) and measure the spreading of a sample drawn from a population. It uses a corrected version of the traditional Moran's $I$ index. Each row of the matrix $\bf W$ should represents a stratum. Each stratum is defined by a particular unit and its neighbouring units. See wpik. The spatial balance measure is equal to

$I_B =\frac{( \bf s- \bar{s}_w)^\top W ( s- \bar{s}_w)}{\bf \sqrt{( s- \bar{s}_w)^\top D ( s- \bar{s}_w) ( s- \bar{s}_w)^\top B ( s- \bar{s}_w)}},$

where $\bf D$ is the diagonal matrix containing the $w_i$ ,

$\bf \bar{s}_w = 1 \frac{ s^\top W 1}{ 1^\top W 1}$

and

$\bf B = W^\top D^{-1} W - \frac{ W^\top 1 1^\top W}{1^\top W 1}.$

To specify the spatial weights uses the argument W.

Value

A numeric value that represents the spatial balance. It could be any real value between -1 (spread) and 1 (clustered).

References

Tillé, Y., Dickson, M.M., Espa, G., and Guiliani, D. (2018). Measuring the spatial balance of a sample: A new measure based on Moran's I index. Spatial Statistics, 23, 182-192.

Examples

  N <- 36
  n <- 12
  x <- seq(1,sqrt(N),1)
  X <- expand.grid(x,x)
  pik <- rep(n/N,N)
  W <- wpik(as.matrix(X),pik,bound = 1,tore = TRUE,shift = FALSE,toreBound = sqrt(N))
  W <- W - diag(diag(W))
  s <- wave(as.matrix(X),pik,tore = TRUE,shift = TRUE,comment = TRUE)
  IB(W,s)

N <- 36
  n <- 12
  x <- seq(1,sqrt(N),1)
  X <- expand.grid(x,x)
  pik <- rep(n/N,N)
  W <- wpik(as.matrix(X),pik,bound = 1,tore = TRUE,shift = FALSE,toreBound = sqrt(N))
  W <- W - diag(diag(W))
  s <- wave(as.matrix(X),pik,tore = TRUE,shift = TRUE,comment = TRUE)
  IB(W,s)

Values $v_k$ to compute the Spatial balance

Description

Calculates the $v_k$ values of the spatial balance developped by Stevens and Olsen (2004) and suggested by Grafström et al. (2012).

Usage

sb_vk(pik, X, s)
sb_vk(pik, X, s)

Arguments

`pik`	vector of the inclusion probabilities. The length should be equal to $N$ .
`X`	matrix representing the spatial coordinates.
`s`	A vector of size $N$ with elements equal 0 or 1. The value 1 indicates that the unit is selected while the value 0 is for non-chosen unit.

Details

The spatial balance measure based on the Voronoï polygons is defined by

$B(S) = \frac{1}{n}\sum_{k\in S} (v_k -1)^2 .$

The function return the $v_k$ values and is mainly based on the function sb of the package BalancedSampling Grafström and Lisic (2019).

Value

A vector of size $N$ with elements equal to the $v_k$ values. If the unit is not selected then the value is equal to 0.

References

Grafström, A., Lundström, N.L.P. and Schelin, L. (2012). Spatially balanced sampling through the Pivotal method. Biometrics, 68(2), 514-520

Grafström, A., Lisic J. (2019). BalancedSampling: Balanced and Spatially Balanced Sampling. R package version 1.5.5. https://CRAN.R-project.org/package=BalancedSampling

Stevens, D. L. Jr. and Olsen, A. R. (2004). Spatially balanced sampling of natural resources. Journal of the American Statistical Association 99, 262-278

Examples

N <- 50
n <- 10
X <- as.matrix(cbind(runif(N),runif(N)))
pik <- rep(n/N,N)
s <- wave(X,pik)
v <- sb_vk(pik,X,s)

N <- 50
n <- 10
X <- as.matrix(cbind(runif(N),runif(N)))
pik <- rep(n/N,N)
s <- wave(X,pik)
v <- sb_vk(pik,X,s)

Hajek-Rosen variance estimator

Description

Estimator of the variance of the Horvitz-Thompson estimator. It is based on the variance estimator of the conditional Poisson sampling design. See Tillé (2020, Chapter 5) for more informations.

Usage

varHAJ(y, pik, s)
varHAJ(y, pik, s)

Arguments

`y`	vector of size $n$ that represent the variable of interest.
`pik`	vector of the inclusion probabilities. The length should be equal to $n$ .
`s`	index vector of size $n$ with elements equal to the selected units.

Details

The function computes the following quantity :

$v_{HAJ}(\widehat{Y}_{HT}) = \frac{n}{n-1} \sum_{k\in S} (1-\pi_k)\left( \frac{y_k}{\pi_k}-\frac{ \sum_{l\in S} (1-\pi_k)/\pi_k }{\sum_{l\in S} (1-\pi_k) } \right)^2 .$

This estimator is well-defined for maximum entropy sampling design and use only inclusion probabilities of order one.

Value

A number, the variance.

References

Tillé, Y. (2020). Sampling and estimation from finite populations. New York: Wiley

Weakly associated vectors sampling

Description

Select a spread sample from inclusion probabilities using the weakly associated vectors sampling method.

Usage

wave(
  X,
  pik,
  bound = 1,
  tore = FALSE,
  shift = FALSE,
  toreBound = -1,
  comment = FALSE,
  fixedSize = TRUE
)
wave(
  X,
  pik,
  bound = 1,
  tore = FALSE,
  shift = FALSE,
  toreBound = -1,
  comment = FALSE,
  fixedSize = TRUE
)

Arguments

`X`	matrix representing the spatial coordinates.
`pik`	vector of the inclusion probabilities. The length should be equal to N.
`bound`	a scalar representing the bound to reach. See Details. Default is 1.
`tore`	an optional logical value, if we are considering the distance on a tore. See Details. Default is `TRUE`.
`shift`	an optional logical value, if you would use a shift perturbation. See Details. Default is `FALSE`.
`toreBound`	a numeric value that specify the size of the grid. Default is -1.
`comment`	an optional logical value, indicating some informations during the execution. Default is `FALSE`.
`fixedSize`	an optional logical value, if you would impose a fixed sample size. Default is `TRUE`

Details

The main idea is derived from the cube method (Devill and Tillé, 2004). At each step, the inclusion probabilities vector pik is randomly modified. This modification is carried out in a direction that best preserves the spreading of the sample.

A stratification matrix $\bf A$ is computed from the spatial weights matrix calculated from the function wpik. Depending if $\bf A$ is full rank or not, the vector giving the direction is not selected in the same way.

If matrix $\bf A$ is not full rank, a vector that is contained in the right null space is selected:

$Null(\bf A) = \{ \bf x \in R^N | \bf A\bf x = \bf 0 \}.$

If matrix $\bf A$ is full rank, we find $\bf v$ , $\bf u$ the singular vectors associated to the smallest singular value $\sigma$ of $\bf A$ such that

$\bf A\bf v = \sigma \bf u,~~~~ \bf A^\top \bf u = \sigma \bf v.$

Vector $\bf v$ is then centered to ensure fixed sample size. At each step, inclusion probabilities is modified and at least on component is set to 0 or 1. Matrix $\bf A$ is updated from the new inclusion probabilities. The whole procedure it repeated until it remains only one component that is not equal to 0 or 1.

For more informations on the options tore and toreBound, see distUnitk. If tore is set up TRUE and toreBound not specified the toreBound is equal to

$N^{1/p}$

where $p$ is equal to the number of column of the matrix X.

For more informations on the option shift, see wpik.

If fixedSize is equal TRUE, the weakest associated vector is centered at each step of the algorithm. This ensures that the size of the selected sample is equal to the sum of the inclusion probabilities.

Value

A vector of size $N$ with elements equal 0 or 1. The value 1 indicates that the unit is selected while the value 0 is for non-chosen unit.

References

Deville, J. C. and Tillé, Y. (2004). Efficient balanced sampling: the cube method. Biometrika, 91(4), 893-912

Examples


#------------
# Example 2D
#------------

N <- 50
n <- 15
pik <- rep(n/N,N)
X <- as.matrix(cbind(runif(N),runif(N)))
s <- wave(X,pik)

#------------
# Example 2D grid 
#------------

N <- 36 # 6 x 6 grid
n <- 12 # number of unit selected
x <- seq(1,sqrt(N),1)
X <- as.matrix(cbind(rep(x,times = sqrt(N)),rep(x,each = sqrt(N))))
pik <- rep(n/N,N)
s <- wave(X,pik, tore = TRUE,shift = FALSE)

#------------
# Example 1D
#------------

N <- 100
n <- 10
X <- as.matrix(seq(1,N,1))
pik <- rep(n/N,N)
s <- wave(X,pik,tore = TRUE,shift =FALSE,comment = TRUE)


#------------
# Example 2D
#------------

N <- 50
n <- 15
pik <- rep(n/N,N)
X <- as.matrix(cbind(runif(N),runif(N)))
s <- wave(X,pik)

#------------
# Example 2D grid 
#------------

N <- 36 # 6 x 6 grid
n <- 12 # number of unit selected
x <- seq(1,sqrt(N),1)
X <- as.matrix(cbind(rep(x,times = sqrt(N)),rep(x,each = sqrt(N))))
pik <- rep(n/N,N)
s <- wave(X,pik, tore = TRUE,shift = FALSE)

#------------
# Example 1D
#------------

N <- 100
n <- 10
X <- as.matrix(seq(1,N,1))
pik <- rep(n/N,N)
s <- wave(X,pik,tore = TRUE,shift =FALSE,comment = TRUE)

Stratification matrix from inclusion probabilities

Description

The stratification matrix is calculated from the inclusion probabilities. It takes the distances between units into account. See Details.

Usage

wpik(X, pik, bound = 1, tore = FALSE, shift = FALSE, toreBound = -1)
wpik(X, pik, bound = 1, tore = FALSE, shift = FALSE, toreBound = -1)

Arguments

`X`	matrix representing the spatial coordinates.
`pik`	vector of the inclusion probabilities. The length should be equal to $N$ .
`bound`	a scalar representing the bound to reach. Default is 1.
`tore`	an optional logical value, if we are considering the distance on a tore. Default is `FALSE`.
`shift`	an optional logical value, if you would use a shift perturbation. See Details for more informations. Default is `FALSE`.
`toreBound`	a numeric value that specify the size of the grid. Default is -1.

Details

We define $G_k$ the set of the nearest neighbor of the unit $k$ including $k$ such that the sum of their inclusion probabilities is just greater than bound. Moreover, let $g_k = \#{G_k}$ , the number of elements in $G_k$ . The matrix $\bf W$ is then defined as follows,

$w_{kl} = \pi_l$ if unit $l$ is in the set of the $g_k - 1$ nearest neighbor of $k$ .
$w_{kl} = \pi_l + 1 - (\sum_{t \in G_k} \pi_t)$ if unit $l$ is the $g_k$ nearest neighbour of $k$ .
$w_{kl} = 0$ otherwise.

Hence, the $k$ th row of the matrix represents neighborhood or stratum of the unit such that the inclusion probabilities sum up to 1 and the $k$ th column the weights that unit $k$ takes for each stratum.

The option shift add a small normally distributed perturbation rnorm(0,0.01) to the coordinates of the centroid of the stratum considered. This could be useful if there are many unit that have the same distances. Indeed, if two units have the same distance and are the last unit before that the bound is reached, then the weights of the both units is updated. If a shift perturbation is used then all the distances are differents and only one unit weight is update such that the bound is reached.

The shift perturbation is generated at the beginning of the procedure such that each stratum is shifted by the same perturbation.

Value

A sparse matrix representing the spatial weights.

Examples


N <- 25
n <- 5
X <- as.matrix(cbind(runif(N),runif(N)))
pik <- rep(n/N,N)
W <- wpik(X,pik)

N <- 25
n <- 5
X <- as.matrix(cbind(runif(N),runif(N)))
pik <- rep(n/N,N)
W <- wpik(X,pik)

Stratification matrix from inverse inclusion probabilities

Description

The stratification matrix is calculated from the inverse inclusion probabilities. It is a direct implementation of the spatial weights specified in Tillé et al., (2018).

Usage

wpikInv(X, pik, tore = FALSE, shift = FALSE, toreBound = -1)
wpikInv(X, pik, tore = FALSE, shift = FALSE, toreBound = -1)

Arguments

`X`	matrix representing the spatial coordinates.
`pik`	vector of the inclusion probabilities. The length should be equal to N.
`tore`	an optional logical value, if we are considering the distance on a tore. Default is `FALSE`.
`shift`	an optional logical value, if you would use a shift perturbation. See Details for more informations. Default is `FALSE`.
`toreBound`	a numeric value that specify the size of the grid. Default is -1.

Details

Entries of the stratification matrix indicates how the units are close from each others. Hence a large value $w_{kl}$ means that the unit $k$ is close to the unit $l$ . This function considers that if unit $k$ were selected in the sample drawn from the population then $k$ would represent $1/\pi_k$ units in the population and, as a consequence, it would be natural to consider that $k$ has $n_k = (1/\pi_k - 1)$ neighbours in the population. The $n_k$ neighbours are the nearest neighbours of $k$ according to distances. The weights are so calculated as follows :

$w_{kl} = 1$ if unit $l \in N_{\lfloor n_k \rfloor }$
$w_{kl} = n_k - \lfloor n_k \rfloor$ if unit $l$ is the $\lceil n_k \rceil$ nearest neighbour of $k$ .
$w_{kl} = 0$ otherwise.

$\lfloor n_k \rfloor$ and $\lceil n_k \rceil$ are the inferior and the superior integers of $n_k$ .

The shift perturbation is generated at the beginning of the procedure such that each stratum is shifted by the same perturbation.

Value

A sparse matrix representing the spatial weights.

References

Tillé, Y., Dickson, M.M., Espa, G., and Guiliani, D. (2018). Measuring the spatial balance of a sample: A new measure based on Moran's I index. Spatial Statistics, 23, 182-192.

Examples


N <- 25
n <- 5
X <- as.matrix(cbind(runif(N),runif(N)))
pik <- rep(n/N,N)
W <- wpikInv(X,pik)

N <- 25
n <- 5
X <- as.matrix(cbind(runif(N),runif(N)))
pik <- rep(n/N,N)
W <- wpikInv(X,pik)

Package 'WaveSampling'

Help Index

Squared Euclidean distances of the unit k.

Description

Usage

Arguments

Details

Value

See Also

Examples

Spreading measure based on Moran's III index

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Values vkv_kvk​ to compute the Spatial balance

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Hajek-Rosen variance estimator

Description

Usage

Arguments

Details

Value

References

Weakly associated vectors sampling

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Stratification matrix from inclusion probabilities

Description

Usage

Arguments

Details

Value

See Also

Examples

Stratification matrix from inverse inclusion probabilities

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Spreading measure based on Moran's $I$ index

Values $v_k$ to compute the Spatial balance