Title: | Rasch Sampler |
---|---|
Description: | MCMC based sampling of binary matrices with fixed margins as used in exact Rasch model tests. |
Authors: | Patrick Mair [cre, aut], Reinhold Hatzinger [aut], Norman D. Verhelst [aut] |
Maintainer: | Patrick Mair <[email protected]> |
License: | GPL-2 |
Version: | 0.8-10 |
Built: | 2024-12-31 07:48:26 UTC |
Source: | CRAN |
The package implements an MCMC algorithm for sampling of
binary matrices with fixed margins complying to the Rasch model.
Its stationary distribution is uniform. The algorithm also allows
for square matrices with fixed diagonal.
Parameter estimates in the Rasch model only depend on the marginal totals of
the data matrix that is used for the estimation. From this it follows that, if the
model is valid, all binary matrices with the same marginals as the observed one
are equally likely. For any statistic of the data matrix, one can approximate
the null distribution, i.e., the distribution if the Rasch model is valid, by taking
a random sample from the collection of equally likely data matrices and constructing
the observed distribution of the statistic.
One can then simply determine the exceedence probability of the statistic in the
observed sample, and thus construct a non-parametric test of the Rasch model.
The main purpose of this package is the implementation of a methodology to build nonparametric
tests for the Rasch model.
In the context of social network theories, where the structure of binary asymmetric
relations is studied, for example,
person esteems person
, which correponds to a 1 in cell
of the associated adjacency matrix. If one wants to study
the distribution of a statistic defined on the adjacency matrix and conditional
on the marginal totals, one has to exclude the diagonal cells from consideration, i.e.,
by keeping the diagonal cells fixed at an arbitrary value. The
RaschSampler
package
has implemented an appropriate option, thus it can be also used for sampling random adjacency
matrices with given marginal totals.
The user has to supply a binary input matrix. After defining appropriate control
parameters using rsctrl
the sampling function rsampler
may be called to obtain an object of class RSmpl
which contains the
generated random matrices in encoded form. After defining an appropriate function
to operate on a binary matrix (e.g., calculate a statistic such as phi.range
)
the application of this function to the sampled matrices is performed
using rstats
. Prior to applying the user defined function, rstats
decodes the matrices packed in the RSmpl
-object.
The package also defines a utility function rsextrobj
for extracting certains parts from
the RSmpl
-object resulting in an object of class RSmplext
.
Both types of objects can be saved and reloaded for later use.
Summary methods are available to print information on these objects, as well as
on the control object RSctr
which is obtained from using
rsctrl
containing the specification for the sampling routine.
The current implementation allows for data matrices up to 4096 rows and 128 columns.
This can be changed by setting nmax
and kmax
in RaschSampler.f90
to values which are a power of 2. These values should also be changed in rserror.R
.
For convenience, we reuse the Fortran code of package version 0.8-1 which cicumvents the
compiler bug in Linux distributions of GCC 4.3. In case of compilation errors (due to a bug in Linux distributions of GCC 4.3) please use
RaschSampler.f90
from package version 0.8-1 and change nmax
and kmax
accordingly (or use GCC 4.4).
Reinhold Hatzinger, Patrick Mair, Norman D. Verhelst
Maintainer: <[email protected]>
Verhelst, N. D. (2008) An Efficient MCMC Algorithm to Sample Binary Matrices with Fixed Marginals. Psychometrika, Volume 73, Number 4
Verhelst, N. D., Hatzinger, R., and Mair, P. (2007) The Rasch Sampler. Journal of Statistical Software, Vol. 20, Issue 4, Feb 2007
Calculates the statistic, i.e., the range of
the inter-column correlations (
-coefficients) for
a binary matrix.
phi.range(mat)
phi.range(mat)
mat |
a binary matrix |
the range of the inter-column correlations
ctr <- rsctrl(burn_in = 10, n_eff = 5, step=10, seed = 123, tfixed = FALSE) mat <- matrix(sample(c(0,1), 50, replace = TRUE), nr = 10) rso <- rsampler(mat, ctr) rso_st <- rstats(rso,phi.range) print(unlist(rso_st))
ctr <- rsctrl(burn_in = 10, n_eff = 5, step=10, seed = 123, tfixed = FALSE) mat <- matrix(sample(c(0,1), 50, replace = TRUE), nr = 10) rso <- rsampler(mat, ctr) rso_st <- rstats(rso,phi.range) print(unlist(rso_st))
The function implements an MCMC algorithm for sampling of binary matrices with fixed margins complying to the Rasch model. Its stationary distribution is uniform. The algorithm also allows for square matrices with fixed diagonal.
rsampler(inpmat, controls = rsctrl())
rsampler(inpmat, controls = rsctrl())
inpmat |
A binary (data) matrix with |
controls |
An object of class |
rsampler
is a wrapper function for a Fortran routine to generate binary random matrices based
on an input matrix.
On output the generated binary matrices are integer encoded. For further
processing of the generated matrices use the function rstats
.
A list of class RSmpl
with components
n |
number of rows of the input matrix |
k |
number of columns of the input matrix |
inpmat |
the input matrix |
tfixed |
|
burn_in |
length of the burn in process |
n_eff |
number of generated matrices (effective matrices) |
step |
controls the number number of void matrices generated in the the burn in
process and when effective matrices are generated (see note
in |
seed |
starting value for the random number generator |
n_tot |
number of matrices in |
outvec |
vector of encoded random matrices |
ier |
error code |
An element of outvec
is a four byte (or 32 bits) integer. The matrices
to be output are stored bitwise (some bits
are unused, since a integer is used for every row of a matrix. So
the number of integers per row needed equals (k+31)/32 (integer division),
which is one to four in the present implementation since the number of columns
and rows must not exceed 128 and 4096, respectively.
The summary method (summary.RSmpl
) prints
information on the content of the output object.
Reinhold Hatzinger, Norman Verhelst
Verhelst, N. D. (2008) An Efficient MCMC Algorithm to Sample Binary Matrices with Fixed Marginals. Psychometrika, Volume 73, Number 4
data(xmpl) ctr<-rsctrl(burn_in=10, n_eff=5, step=10, seed=0, tfixed=FALSE) res<-rsampler(xmpl,ctr) summary(res)
data(xmpl) ctr<-rsctrl(burn_in=10, n_eff=5, step=10, seed=0, tfixed=FALSE) res<-rsampler(xmpl,ctr) summary(res)
The object of class RSctr
represents the control parameter
specification for the sampling function rsampler
.
A legitimate RSctr
object is a list with components
burn_in |
the number of matrices to be sampled to come close to a stationary distribution. |
n_eff |
the number of effective matrices, i.e.,
the number of matrices
to be generated by the sampling function |
step |
controls the number number of void matrices generated in the the burn in
process and when effective matrices are generated (see note
in |
seed |
is the indicator for the seed of the random number generator.
If the value of seed at equals zero, a seed is generated
by the sampling function |
tfixed |
|
This object is returned from function
rsctrl
.
This class has a method for the generic summary
function.
Various parameters that control aspects of the random generation of binary matrices.
rsctrl(burn_in = 100, n_eff = 100, step = 16, seed = 0, tfixed = FALSE)
rsctrl(burn_in = 100, n_eff = 100, step = 16, seed = 0, tfixed = FALSE)
burn_in |
the number of sampled matrices to
come close to a stationary distribution.
The default is |
n_eff |
the number of effective matrices, i.e.,
the number of matrices
to be generated by the sampling function |
step |
controls the number number of void matrices generated in the the burn in
process and when effective matrices are generated (see note
below). The default is |
seed |
is the indicator for the seed of the random number generator.
Its value must be in the range 0 and 2147483646 (2**31-2).
If the value of seed equals zero, a seed is generated
by the sampling function |
tfixed |
logical, – specifies if in case of a quadratic input
matrix the diagonal is considered fixed (see note below).
The default is |
A list of class RSctr
with components
burn_in
, n_eff
, step
,
seed
, tfixed
.,
If one of the components is incorrectly specified
the error function rserror
is called and some informations are printed. The ouput object
will not be defined.
The specification of step
controls the sampling algorithm as follows:
If , e.g., burn_in = 10
, n_eff = 5
, and step = 2
,
then during the burn in period step * burn_in = 2 * 10
matrices are generated. After that, n_eff * step = 5 * 2
matrices
are generated and every second matrix of these last ten is returned from
link{rsampler}
.tfixed
has no effect if the input matrix is not quadratic,
i.e., all matrix elements are considered free (unrestricted).
If the input matrix is quadratic, and tfixed = TRUE
,
the main diagonal of the matrix is considered as fixed.
On return from link{rsampler}
all diagonal elements
of the generated matrices are set to zero.
This specification applies, e.g.,
to analyzing square incidence matrices
representing binary asymmetric relation
in social network theory.
The summary method (summary.RSctr
) prints
the current definitions.
ctr <- rsctrl(n_eff = 1, seed = 987654321) # specify new controls summary(ctr) ## Not run: ctr2 <- rsctrl(step = -3, n_eff = 10000) # incorrect specifications ## End(Not run)
ctr <- rsctrl(n_eff = 1, seed = 987654321) # specify new controls summary(ctr) ## Not run: ctr2 <- rsctrl(step = -3, n_eff = 10000) # incorrect specifications ## End(Not run)
Convenience function to extract a matrix.
rsextrmat(RSobj, mat.no = 1)
rsextrmat(RSobj, mat.no = 1)
RSobj |
object as obtained from using |
mat.no |
number of the matrix to extract from the sample object. |
One of the matrices (either the original or a sampled matrix)
ctr <- rsctrl(burn_in = 10, n_eff = 3, step=10, seed = 0, tfixed = FALSE) mat <- matrix(sample(c(0,1), 50, replace = TRUE), nr = 10) all_m <- rsampler(mat, ctr) summary(all_m) # extract the third sampled matrix (here the fourth) third_m <- rsextrmat(all_m, 4) head(third_m)
ctr <- rsctrl(burn_in = 10, n_eff = 3, step=10, seed = 0, tfixed = FALSE) mat <- matrix(sample(c(0,1), 50, replace = TRUE), nr = 10) all_m <- rsampler(mat, ctr) summary(all_m) # extract the third sampled matrix (here the fourth) third_m <- rsextrmat(all_m, 4) head(third_m)
Utility function to extract some of the generated matrices, still in encoded form.
rsextrobj(RSobj, start = 1, end = 8192)
rsextrobj(RSobj, start = 1, end = 8192)
RSobj |
object as obtained from using |
start |
number of the matrix to start with. When specifying 1 (the default value) the original input matrix is included in the output object. |
end |
last matrix to be extracted. If |
A list of class RSmpl
with components
n |
number of rows of the input matrix |
k |
number of columns of the input matrix |
inpmat |
the input matrix |
tfixed |
|
burn_in |
length of the burn in process |
n_eff |
number of generated matrices (effective matrices) |
step |
controls the number number of void matrices generated in the burn in
process and when effective matrices are generated (see note
in |
seed |
starting value for the random number generator |
n_tot |
number of matrices in |
outvec |
vector of encoded random matrices |
ier |
error code |
By default, all generated matrices plus
the original matrix (in position 1) are contained in
outvec
, thus n_tot = n_eff + 1
. If
the original matrix is not in outvec
then
n_tot = n_eff
.
For saving and loading objects
of class RSobj
see the example below.
For extracting a decoded (directly usable) matrix use rsextrmat
.
ctr <- rsctrl(burn_in = 10, n_eff = 3, step=10, seed = 0, tfixed = FALSE) mat <- matrix(sample(c(0,1), 50, replace = TRUE), nr = 10) all_m <- rsampler(mat, ctr) summary(all_m) some_m <- rsextrobj(all_m, 1, 2) summary(some_m) ## Not run: save(some_m, file = "some.RSobj") some_new <- load("some.RSobj") summary(some_new) ## End(Not run)
ctr <- rsctrl(burn_in = 10, n_eff = 3, step=10, seed = 0, tfixed = FALSE) mat <- matrix(sample(c(0,1), 50, replace = TRUE), nr = 10) all_m <- rsampler(mat, ctr) summary(all_m) some_m <- rsextrobj(all_m, 1, 2) summary(some_m) ## Not run: save(some_m, file = "some.RSobj") some_new <- load("some.RSobj") summary(some_new) ## End(Not run)
The objects of class RSmpl
and RSmplext
contain
the original input matrix, the generated (encoded) random matrices, and
some information about the sampling process.
A list of class RSmpl
or RSmplext
with components
n |
number of rows of the input matrix |
k |
number of columns of the input matrix |
inpmat |
the input matrix |
tfixed |
|
burn_in |
length of the burn in process |
n_eff |
number of generated matrices (effective matrices) |
step |
controls the number number of void matrices generated in the the burn in
process and when effective matrices are generated (see note
in |
seed |
starting value for the random number generator |
n_tot |
number of matrices in |
outvec |
vector of encoded random matrices |
ier |
error code (see below) |
These classes of objects are returned from
rsampler
and rsextrobj
.
Both classes have methods for the generic summary
function.
By default, all generated matrices plus
the original matrix (in position 1) are contained in
outvec
, thus n_tot = n_eff + 1
. If
the original matrix is not in outvec
then
n_tot = n_eff
.
If ier
is 0, no error was detected. Otherwise use
the error function rserror(ier)
to obtain some informations.
For saving and loading objects
of class RSmpl
or RSmplext
see the example in rsextrobj
.
This function is used to calculate user defined statistics for the (original and) sampled matrices. A user defined function has to be provided.
rstats(RSobj, userfunc, ...)
rstats(RSobj, userfunc, ...)
RSobj |
|
userfunc |
a user defined function which performs operations on the (original and) sampled matrices. The first argument in the definition of the user function must be an object of type matrix. |
... |
further arguments, that are passed to the user function |
A list of objects as specified in the user supplied function
The encoded matrices that are contained in the
input object RSobj
are decoded and passed to the user function in turn.
If RSobj
is not an object obtained from either rsampler
or rsextrobj
or
no user function is specified an error message is printed.
A simple user function, phi.range
, is included in
the RaschSampler package for demonstration purposes.
rstats
can be used to obtain the 0/1 values for any
of the sampled matrices (see second example below). Please note,
that the output from the user function is stored in a list where
the number of components corresponds to the number of matrices passed
to the user function (see third example).
ctr <- rsctrl(burn_in = 10, n_eff = 5, step=10, seed = 12345678, tfixed = FALSE) mat <- matrix(sample(c(0,1), 50, replace = TRUE), nr = 10) rso <- rsampler(mat, ctr) rso_st <- rstats(rso,phi.range) unlist(rso_st) # extract the third generated matrix # (here, the first is the input matrix) # and decode it into rsmat rso2 <- rsextrobj(rso,4,4) summary(rso2) rsmat <- rstats(rso2, function(x) matrix(x, nr = rso2$n)) print(rsmat[[1]]) # extract only the first r rows of the third generated matrix mat<-function(x, nr = nr, r = 3){ m <- matrix(x, nr = nr) m[1:r,] } rsmat2 <- rstats(rso2, mat, nr=rso$n, r = 3) print(rsmat2[[1]]) # apply a user function to the decoded object print(phi.range(rsmat[[1]]))
ctr <- rsctrl(burn_in = 10, n_eff = 5, step=10, seed = 12345678, tfixed = FALSE) mat <- matrix(sample(c(0,1), 50, replace = TRUE), nr = 10) rso <- rsampler(mat, ctr) rso_st <- rstats(rso,phi.range) unlist(rso_st) # extract the third generated matrix # (here, the first is the input matrix) # and decode it into rsmat rso2 <- rsextrobj(rso,4,4) summary(rso2) rsmat <- rstats(rso2, function(x) matrix(x, nr = rso2$n)) print(rsmat[[1]]) # extract only the first r rows of the third generated matrix mat<-function(x, nr = nr, r = 3){ m <- matrix(x, nr = nr) m[1:r,] } rsmat2 <- rstats(rso2, mat, nr=rso$n, r = 3) print(rsmat2[[1]]) # apply a user function to the decoded object print(phi.range(rsmat[[1]]))
Prints the current definitions for the sampling function.
## S3 method for class 'RSctr' summary(object, ...)
## S3 method for class 'RSctr' summary(object, ...)
object |
object of class |
... |
potential further arguments (ignored) |
ctr <- rsctrl(n_eff = 1, seed = 123123123) # specify controls summary(ctr)
ctr <- rsctrl(n_eff = 1, seed = 123123123) # specify controls summary(ctr)
Prints a summary list for sample objects of class RSmpl
and RSmplext
.
## S3 method for class 'RSmpl' summary(object, ...) ## S3 method for class 'RSmplext' summary(object, ...)
## S3 method for class 'RSmpl' summary(object, ...) ## S3 method for class 'RSmplext' summary(object, ...)
object |
object as obtained from |
... |
potential further arguments (ignored) |
Describes the status of an sample object.
ctr <- rsctrl(burn_in = 10, n_eff = 3, step=10, seed = 0, tfixed = FALSE) mat <- matrix(sample(c(0,1), 50, replace = TRUE), nr = 10) all_m <- rsampler(mat, ctr) summary(all_m) some_m <- rsextrobj(all_m, 1, 2) summary(some_m)
ctr <- rsctrl(burn_in = 10, n_eff = 3, step=10, seed = 0, tfixed = FALSE) mat <- matrix(sample(c(0,1), 50, replace = TRUE), nr = 10) all_m <- rsampler(mat, ctr) summary(all_m) some_m <- rsextrobj(all_m, 1, 2) summary(some_m)
Ficitious data sets - matrices with binary responses
data(xmpl)
data(xmpl)
The format of xmpl
is:
300 rows (referring to subjects)
30 columns (referring to items)
The format of xmplbig
is:
4096 rows (referring to subjects)
128 columns (referring to items) xmplbig
has the maximum dimensions that the RaschSampler package
can handle currently.
data(xmpl) print(head(xmpl))
data(xmpl) print(head(xmpl))