Title: | Fast Computation of some Matrices Useful in Statistics |
---|---|
Description: | Small set of functions to fast computation of some matrices and operations useful in statistics and econometrics. Currently, there are functions for efficient computation of duplication, commutation and symmetrizer matrices with minimal storage requirements. Some commonly used matrix decompositions (LU and LDL), basic matrix operations (for instance, Hadamard, Kronecker products and the Sherman-Morrison formula) and iterative solvers for linear systems are also available. In addition, the package includes a number of common statistical procedures such as the sweep operator, weighted mean and covariance matrix using an online algorithm, linear regression (using Cholesky, QR, SVD, sweep operator and conjugate gradients methods), ridge regression (with optimal selection of the ridge parameter considering several procedures), omnibus tests for univariate normality, functions to compute the multivariate skewness, kurtosis, the Mahalanobis distance (checking the positive defineteness), and the Wilson-Hilferty transformation of gamma variables. Furthermore, the package provides interfaces to C code callable by another C code from other R packages. |
Authors: | Felipe Osorio [aut, cre] , Alonso Ogueda [aut] |
Maintainer: | Felipe Osorio <[email protected]> |
License: | GPL-3 |
Version: | 0.5-7721 |
Built: | 2024-11-07 06:14:40 UTC |
Source: | CRAN |
Multiplication of 3-dimensional arrays was first introduced by Bates and Watts (1980). More extensions and technical details can be found in Wei (1998).
array.mult(a, b, x)
array.mult(a, b, x)
a |
a numeric matrix. |
b |
a numeric matrix. |
x |
a three-dimensional array. |
Let be a 3-dimensional
where
indices
and
indicate face, row and column, respectively. The
product
is an
array, with
and
are
and
matrices
respectively. The elements of
are defined as:
array.mult
returns a 3-dimensional array of dimension .
Bates, D.M., Watts, D.G. (1980). Relative curvature measures of nonlinearity. Journal of the Royal Statistical Society, Series B 42, 1-25.
Wei, B.C. (1998). Exponential Family Nonlinear Models. Springer, New York.
x <- array(0, dim = c(2,3,3)) # 2 x 3 x 3 array x[,,1] <- c(1,2,2,4,3,6) x[,,2] <- c(2,4,4,8,6,12) x[,,3] <- c(3,6,6,12,9,18) a <- matrix(1, nrow = 2, ncol = 3) b <- matrix(1, nrow = 3, ncol = 2) y <- array.mult(a, b, x) # a 2 x 2 x 2 array y
x <- array(0, dim = c(2,3,3)) # 2 x 3 x 3 array x[,,1] <- c(1,2,2,4,3,6) x[,,2] <- c(2,4,4,8,6,12) x[,,3] <- c(3,6,6,12,9,18) a <- matrix(1, nrow = 2, ncol = 3) b <- matrix(1, nrow = 3, ncol = 2) y <- array.mult(a, b, x) # a 2 x 2 x 2 array y
Force a square matrix x
to be symmetric
asSymmetric(x, lower = TRUE)
asSymmetric(x, lower = TRUE)
x |
a square matrix to be forced to be symmetric. |
lower |
logical, should the upper (lower) triangle be replaced with the lower (upper) triangle? |
a square symmetric matrix.
a <- matrix(1:16, ncol = 4) isSymmetric(a) # FALSE a <- asSymmetric(a) # copy lower triangle into upper triangle
a <- matrix(1:16, ncol = 4) isSymmetric(a) # FALSE a <- asSymmetric(a) # copy lower triangle into upper triangle
Computes the Bezier curve based on control points using the De Casteljau's method.
bezier(x, y, ngrid = 200)
bezier(x, y, ngrid = 200)
x , y
|
vector giving the coordinates of the control points. Missing values are deleted. |
ngrid |
number of elements in the grid used to compute the smoother. |
Given control points the Bezier curve is given by
defined as
where . To evaluate the Bezier curve the De Casteljau's method is used.
A list containing xgrid
and ygrid
elements used to plot the Bezier curve.
# a tiny example x <- c(1.0, 0.25, 1.25, 2.5, 4.00, 5.0) y <- c(0.5, 2.00, 3.75, 4.0, 3.25, 1.0) plot(x, y, type = "o") z <- bezier(x, y, ngrid = 50) lines(z$xgrid, z$ygrid, lwd = 2, lty = 2, col = "red") # other simple example x <- c(4,6,4,5,6,7) y <- 1:6 plot(x, y, type = "o") z <- bezier(x, y, ngrid = 50) lines(z$xgrid, z$ygrid, lwd = 2, lty = 2, col = "red")
# a tiny example x <- c(1.0, 0.25, 1.25, 2.5, 4.00, 5.0) y <- c(0.5, 2.00, 3.75, 4.0, 3.25, 1.0) plot(x, y, type = "o") z <- bezier(x, y, ngrid = 50) lines(z$xgrid, z$ygrid, lwd = 2, lty = 2, col = "red") # other simple example x <- c(4,6,4,5,6,7) y <- 1:6 plot(x, y, type = "o") z <- bezier(x, y, ngrid = 50) lines(z$xgrid, z$ygrid, lwd = 2, lty = 2, col = "red")
Bracket product of a matrix and a 3-dimensional array.
bracket.prod(a, x)
bracket.prod(a, x)
a |
a numeric matrix. |
x |
a three-dimensional array. |
Let be a 3-dimensional
array and
an
matrix, then
is called the bracket product of
and
, that is an
with elements
bracket.prod
returns a 3-dimensional array of dimension .
Wei, B.C. (1998). Exponential Family Nonlinear Models. Springer, New York.
x <- array(0, dim = c(2,3,3)) # 2 x 3 x 3 array x[,,1] <- c(1,2,2,4,3,6) x[,,2] <- c(2,4,4,8,6,12) x[,,3] <- c(3,6,6,12,9,18) a <- matrix(1, nrow = 3, ncol = 2) y <- bracket.prod(a, x) # a 3 x 3 x 3 array y
x <- array(0, dim = c(2,3,3)) # 2 x 3 x 3 array x[,,1] <- c(1,2,2,4,3,6) x[,,2] <- c(2,4,4,8,6,12) x[,,3] <- c(3,6,6,12,9,18) a <- matrix(1, nrow = 3, ncol = 2) y <- bracket.prod(a, x) # a 3 x 3 x 3 array y
Conjugate gradients (CG) method is an iterative algorithm for solving linear systems with positive definite coefficient matrices.
cg(a, b, maxiter = 200, tol = 1e-7)
cg(a, b, maxiter = 200, tol = 1e-7)
a |
a symmetric positive definite matrix containing the coefficients of the linear system. |
b |
a vector of right-hand sides of the linear system. |
maxiter |
the maximum number of iterations. Defaults to |
tol |
tolerance level for stopping iterations. |
a vector with the approximate solution, the iterations performed are returned as the attribute 'iterations'.
The underlying C
code does not check for symmetry nor positive definitiveness.
Golub, G.H., Van Loan, C.F. (1996). Matrix Computations, 3rd Edition. John Hopkins University Press.
Hestenes, M.R., Stiefel, E. (1952). Methods of conjugate gradients for solving linear equations. Journal of Research of the National Bureau of Standards 49, 409-436.
a <- matrix(c(4,3,0,3,4,-1,0,-1,4), ncol = 3) b <- c(24,30,-24) z <- cg(a, b) z # converged in 3 iterations
a <- matrix(c(4,3,0,3,4,-1,0,-1,4), ncol = 3) b <- c(24,30,-24) z <- cg(a, b) z # converged in 3 iterations
function cholupdate
, where R = chol(A)
is the original Cholesky
factorization of , returns the upper triangular Cholesky factor of
, with
a column vector of appropriate dimension.
cholupdate(r, x)
cholupdate(r, x)
r |
a upper triangular matrix, the Cholesky factor of matrix a. |
x |
vector defining the rank one update. |
Golub, G.H., Van Loan, C.F. (2013). Matrix Computations, 4th Edition. John Hopkins University Press.
a <- matrix(c(1,1,1,1,2,3,1,3,6), ncol = 3) r <- chol(a) x <- c(0,0,1) b <- a + outer(x,x) r1 <- cholupdate(r, x) r1 all(r1 == chol(b)) # TRUE
a <- matrix(c(1,1,1,1,2,3,1,3,6), ncol = 3) r <- chol(a) x <- c(0,0,1) b <- a + outer(x,x) r1 <- cholupdate(r, x) r1 all(r1 == chol(b)) # TRUE
Forms a symmetric circulant matrix using a backwards shift of its first column
circulant(x)
circulant(x)
x |
the first column to form the circulant matrix. |
A symmetric circulant matrix.
x <- c(2,3,5,7,11,13) circulant(x)
x <- c(2,3,5,7,11,13) circulant(x)
This function provides the minimum information required to create the commutation matrix.
The commutation matrix is a square matrix of order that, for an
matrix
, transform
vec
) to
vec
.
comm.info(m = 1, n = m, condensed = TRUE)
comm.info(m = 1, n = m, condensed = TRUE)
m |
a positive integer row dimension. |
n |
a positive integer column dimension. |
condensed |
logical. Information should be returned in compact form? |
This function returns a list containing two vectors that represent an element of
the commutation matrix and is accesed by the indexes in vectors row
and col
.
This information is used by function comm.prod
to do some operations
involving the commutation matrix without forming it. This information also can be
obtained using function commutation
.
A list containing the following elements:
row |
vector of indexes, each entry represents the row index of the commutation matrix. |
col |
vector of indexes, each entry represents the column index of the commutation
matrix. Only present if |
m |
positive integer, row dimension. |
n |
positive integer, column dimension. |
Magnus, J.R., Neudecker, H. (1979). The commutation matrix: some properties and applications. The Annals of Statistics 7, 381-394.
z <- comm.info(m = 3, n = 2, condensed = FALSE) z # where are the ones in commutation matrix of order '3,2'? K32 <- commutation(m = 3, n = 2, matrix = TRUE) K32 # only recommended if m and n are very small
z <- comm.info(m = 3, n = 2, condensed = FALSE) z # where are the ones in commutation matrix of order '3,2'? K32 <- commutation(m = 3, n = 2, matrix = TRUE) K32 # only recommended if m and n are very small
Given the row and column dimensions of a commutation matrix of order
and a conformable matrix
, performs one of the matrix-matrix
operations:
, if
side = "left"
and transposed = FALSE
, or
, if
side = "left"
and transposed = TRUE
, or
, if
side = "right"
and transposed = FALSE
, or
, if
side = "right"
and transposed = TRUE
.
The main aim of comm.prod
is to do this matrix multiplication without forming
the commutation matrix.
comm.prod(m = 1, n = m, x = NULL, transposed = FALSE, side = "left")
comm.prod(m = 1, n = m, x = NULL, transposed = FALSE, side = "left")
m |
a positive integer row dimension. |
n |
a positive integer column dimension. |
x |
numeric matrix (or vector). |
transposed |
logical. Commutation matrix should be transposed? |
side |
a string selecting if commutation matrix is pre-multiplying |
Underlying Fortran
code only uses information provided by comm.info
to performs the matrix multiplication. The commutation matrix is never created.
K42 <- commutation(m = 4, n = 2, matrix = TRUE) x <- matrix(1:24, ncol = 3) y <- K42 %*% x z <- comm.prod(m = 4, n = 2, x) # K42 is not stored all(z == y) # matrices y and z are equal!
K42 <- commutation(m = 4, n = 2, matrix = TRUE) x <- matrix(1:24, ncol = 3) y <- K42 %*% x z <- comm.prod(m = 4, n = 2, x) # K42 is not stored all(z == y) # matrices y and z are equal!
This function returns the commutation matrix of order which transforms,
for an
matrix
,
vec
to
vec
.
commutation(m = 1, n = m, matrix = FALSE, condensed = FALSE)
commutation(m = 1, n = m, matrix = FALSE, condensed = FALSE)
m |
a positive integer row dimension. |
n |
a positive integer column dimension. |
matrix |
a logical indicating whether the commutation matrix will be returned. |
condensed |
logical. Information should be returned in compact form? |
This function is a wrapper function for the function comm.info
. This function
provides the minimum information required to create the commutation matrix. If option
matrix = FALSE
the commutation matrix is stored in two vectors containing the
coordinate list of indexes for rows and columns. Option condensed = TRUE
only
returns vector of indexes for the rows of commutation matrix.
Warning: matrix = TRUE
is not recommended, unless the order m
and n
be small. This matrix can require a huge amount of storage.
Returns an by
matrix (if requested).
Magnus, J.R., Neudecker, H. (1979). The commutation matrix: some properties and applications. The Annals of Statistics 7, 381-394.
Magnus, J.R., Neudecker, H. (2007). Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd Edition. Wiley, New York.
z <- commutation(m = 100, condensed = TRUE) object.size(z) # 40.6 Kb of storage z <- commutation(m = 100, condensed = FALSE) object.size(z) # 80.7 Kb of storage K100 <- commutation(m = 100, matrix = TRUE) # time: < 2 secs object.size(K100) # 400 Mb of storage, do not request this matrix! # a small example K32 <- commutation(m = 3, n = 2, matrix = TRUE) a <- matrix(1:6, ncol = 2) v <- K32 %*% vec(a) all(vec(t(a)) == as.vector(v)) # vectors are equal!
z <- commutation(m = 100, condensed = TRUE) object.size(z) # 40.6 Kb of storage z <- commutation(m = 100, condensed = FALSE) object.size(z) # 80.7 Kb of storage K100 <- commutation(m = 100, matrix = TRUE) # time: < 2 secs object.size(K100) # 400 Mb of storage, do not request this matrix! # a small example K32 <- commutation(m = 3, n = 2, matrix = TRUE) a <- matrix(1:6, ncol = 2) v <- K32 %*% vec(a) all(vec(t(a)) == as.vector(v)) # vectors are equal!
This function is a constructor for the corAR1
correlation matrix representing
an autocorrelation structure of order 1.
corAR1(rho, p = 2)
corAR1(rho, p = 2)
rho |
the value of the lag 1 autocorrelation, which must be between -1 and 1. |
p |
dimension of the requested correlation matrix. |
Returns a by
matrix.
R <- corAR1(rho = 0.8, p = 5)
R <- corAR1(rho = 0.8, p = 5)
This function is a constructor for the corCS
correlation matrix representing
a compound symmetry structure corresponding to uniform correlation.
corCS(rho, p = 2)
corCS(rho, p = 2)
rho |
the value of the correlation between any two correlated observations, which must be between -1 and 1. |
p |
dimension of the requested correlation matrix. |
Returns a by
matrix.
R <- corCS(rho = 0.8, p = 5)
R <- corCS(rho = 0.8, p = 5)
Returns a list containing the mean and covariance matrix of the data.
cov.MSSD(x)
cov.MSSD(x)
x |
a matrix or data frame. As usual, rows are observations and columns are variables. |
This procedure uses the Holmes-Mergen method using the difference between each successive pairs of observations also known as Mean Square Successive Method (MSSD) to estimate the covariance matrix, which is given by
A list containing the following named components:
mean |
an estimate for the center (mean) of the data. |
cov |
the estimated covariance matrix. |
Holmes, D.S., Mergen, A.E. (1993).
Improving the performance of the control chart.
Quality Engineering 5, 619-625.
x <- cbind(1:10, c(1:3, 8:5, 8:10)) z0 <- cov(x) z0 z1 <- cov.MSSD(x) z1
x <- cbind(1:10, c(1:3, 8:5, 8:10)) z0 <- cov(x) z0 z1 <- cov.MSSD(x) z1
Returns a list containing estimates of the weighted mean and covariance matrix of the data.
cov.weighted(x, weights = rep(1, nrow(x)))
cov.weighted(x, weights = rep(1, nrow(x)))
x |
a matrix or data frame. As usual, rows are observations and columns are variables. |
weights |
a non-negative and non-zero vector of weights for each
observation. Its length must equal the number of rows of |
The covariance matrix is divided by the number of observations, which arise for instance, when we use the class of elliptical contoured distributions. Thus,
This differs from the behaviour of function cov.wt
.
A list containing the following named components:
mean |
an estimate for the center (mean) of the data. |
cov |
the estimated (weighted) covariance matrix. |
Clarke, M.R.B. (1971). Algorithm AS 41: Updating the sample mean and dispersion matrix. Applied Statistics 20, 206-209.
x <- cbind(1:10, c(1:3, 8:5, 8:10)) z0 <- cov.weighted(x) # all weights are 1 D2 <- Mahalanobis(x, center = z0$mean, cov = z0$cov) p <- ncol(x) wts <- (p + 1) / (1 + D2) # nice weights! z1 <- cov.weighted(x, weights = wts) z1
x <- cbind(1:10, c(1:3, 8:5, 8:10)) z0 <- cov.weighted(x) # all weights are 1 D2 <- Mahalanobis(x, center = z0$mean, cov = z0$cov) p <- ncol(x) wts <- (p + 1) / (1 + D2) # nice weights! z1 <- cov.weighted(x, weights = wts) z1
Given the order of two duplication matrices and a conformable matrix ,
this function performs the operation:
,
where
and
are duplication matrices of order
and
, respectively.
dupl.cross(n = 1, k = n, x = NULL)
dupl.cross(n = 1, k = n, x = NULL)
n |
order of the duplication matrix used pre-multiplying |
k |
order of the duplication matrix used post-multiplying |
x |
numeric matrix, this argument is required. |
This function calls dupl.prod
to performs the matrix multiplications required
but without forming any duplication matrices.
D2 <- duplication(n = 2, matrix = TRUE) D3 <- duplication(n = 3, matrix = TRUE) x <- matrix(1, nrow = 9, ncol = 4) y <- t(D3) %*% x %*% D2 z <- dupl.cross(n = 3, k = 2, x) # D2 and D3 are not stored all(z == y) # matrices y and z are equal! x <- matrix(1, nrow = 9, ncol = 9) z <- dupl.cross(n = 3, x = x) # same matrix is used to pre- and post-multiplying x z # print result
D2 <- duplication(n = 2, matrix = TRUE) D3 <- duplication(n = 3, matrix = TRUE) x <- matrix(1, nrow = 9, ncol = 4) y <- t(D3) %*% x %*% D2 z <- dupl.cross(n = 3, k = 2, x) # D2 and D3 are not stored all(z == y) # matrices y and z are equal! x <- matrix(1, nrow = 9, ncol = 9) z <- dupl.cross(n = 3, x = x) # same matrix is used to pre- and post-multiplying x z # print result
This function provides the minimum information required to create the duplication matrix.
dupl.info(n = 1, condensed = TRUE)
dupl.info(n = 1, condensed = TRUE)
n |
order of the duplication matrix. |
condensed |
logical. Information should be returned in compact form? |
This function returns a list containing two vectors that represent an element of
the duplication matrix and is accesed by the indexes in vectors row
and col
.
This information is used by function dupl.prod
to do some operations
involving the duplication matrix without forming it. This information also can be
obtained using function duplication
A list containing the following elements:
row |
vector of indexes, each entry represents the row index of the duplication
matrix. Only present if |
col |
vector of indexes, each entry represents the column index of the duplication matrix. |
order |
order of the duplication matrix. |
z <- dupl.info(n = 3, condensed = FALSE) z # where are the ones in duplication of order 3? D3 <- duplication(n = 3, matrix = TRUE) D3 # only recommended if n is very small
z <- dupl.info(n = 3, condensed = FALSE) z # where are the ones in duplication of order 3? D3 <- duplication(n = 3, matrix = TRUE) D3 # only recommended if n is very small
Given the order of a duplication and a conformable matrix , performs
one of the matrix-matrix operations:
, if
side = "left"
and transposed = FALSE
, or
, if
side = "left"
and transposed = TRUE
, or
, if
side = "right"
and transposed = FALSE
, or
, if
side = "right"
and transposed = TRUE
,
where is the duplication matrix of order
. The main aim of
dupl.prod
is to do this matrix multiplication without forming the
duplication matrix.
dupl.prod(n = 1, x, transposed = FALSE, side = "left")
dupl.prod(n = 1, x, transposed = FALSE, side = "left")
n |
order of the duplication matrix. |
x |
numeric matrix (or vector). |
transposed |
logical. Duplication matrix should be transposed? |
side |
a string selecting if duplication matrix is pre-multiplying |
Underlying C
code only uses information provided by dupl.info
to
performs the matrix multiplication. The duplication matrix is never created.
D4 <- duplication(n = 4, matrix = TRUE) x <- matrix(1, nrow = 16, ncol = 2) y <- crossprod(D4, x) z <- dupl.prod(n = 4, x, transposed = TRUE) # D4 is not stored all(z == y) # matrices y and z are equal!
D4 <- duplication(n = 4, matrix = TRUE) x <- matrix(1, nrow = 16, ncol = 2) y <- crossprod(D4, x) z <- dupl.prod(n = 4, x, transposed = TRUE) # D4 is not stored all(z == y) # matrices y and z are equal!
This function returns the duplication matrix of order which transforms,
for a symmetric matrix
,
vech
into
vec
.
duplication(n = 1, matrix = FALSE, condensed = FALSE)
duplication(n = 1, matrix = FALSE, condensed = FALSE)
n |
order of the duplication matrix. |
matrix |
a logical indicating whether the duplication matrix will be returned. |
condensed |
logical. Information should be returned in compact form?. |
This function is a wrapper function for the function dupl.info
. This function
provides the minimum information required to create the duplication matrix. If option
matrix = FALSE
the duplication matrix is stored in two vectors containing the
coordinate list of indexes for rows and columns. Option condensed = TRUE
only
returns vector of indexes for the columns of duplication matrix.
Warning: matrix = TRUE
is not recommended, unless the order n
be small. This matrix can require a huge amount of storage.
Returns an by
matrix (if requested).
Magnus, J.R., Neudecker, H. (1980). The elimination matrix, some lemmas and applications. SIAM Journal on Algebraic Discrete Methods 1, 422-449.
Magnus, J.R., Neudecker, H. (2007). Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd Edition. Wiley, New York.
z <- duplication(n = 100, condensed = TRUE) object.size(z) # 40.5 Kb of storage z <- duplication(n = 100, condensed = FALSE) object.size(z) # 80.6 Kb of storage D100 <- duplication(n = 100, matrix = TRUE) object.size(D100) # 202 Mb of storage, do not request this matrix! # a small example D3 <- duplication(n = 3, matrix = TRUE) a <- matrix(c( 1, 2, 3, 2, 3, 4, 3, 4, 5), nrow = 3) upper <- vech(a) v <- D3 %*% upper all(vec(a) == as.vector(v)) # vectors are equal!
z <- duplication(n = 100, condensed = TRUE) object.size(z) # 40.5 Kb of storage z <- duplication(n = 100, condensed = FALSE) object.size(z) # 80.6 Kb of storage D100 <- duplication(n = 100, matrix = TRUE) object.size(D100) # 202 Mb of storage, do not request this matrix! # a small example D3 <- duplication(n = 3, matrix = TRUE) a <- matrix(c( 1, 2, 3, 2, 3, 4, 3, 4, 5), nrow = 3) upper <- vech(a) v <- D3 %*% upper all(vec(a) == as.vector(v)) # vectors are equal!
Equilibrate a rectangular or symmetric matrix using 2-norm.
equilibrate(x, scale = TRUE)
equilibrate(x, scale = TRUE)
x |
a numeric matrix. |
scale |
a logical value, |
For scale = TRUE
, the equilibrated matrix. The scalings and an approximation
of the condition number, are returned as attributes "scales"
and "condition"
.
If x
is a rectangular matrix, only the columns are equilibrated.
x <- matrix(c(1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 1,-1, 1, 2,-1, 1, 3,-1), ncol = 3, byrow = TRUE) z <- equilibrate(x) apply(z, 2, function(x) sum(x^2)) # all 1 xx <- crossprod(x) equilibrate(xx)
x <- matrix(c(1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 1,-1, 1, 2,-1, 1, 3,-1), ncol = 3, byrow = TRUE) z <- equilibrate(x) apply(z, 2, function(x) sum(x^2)) # all 1 xx <- crossprod(x) equilibrate(xx)
This function returns the Frank matrix of order .
frank(n = 1)
frank(n = 1)
n |
order of the Frank matrix. |
A Frank matrix of order is a square matrix
defined as
Returns an by
matrix.
Frank, W.L. (1958). Computing eigenvalues of complex matrices by determinant evaluation and by methods of Danilewski and Wielandt. Journal of the Society for Industrial and Applied Mathematics 6, 378-392.
F5 <- frank(n = 5) det(F5) # equals 1
F5 <- frank(n = 5) det(F5) # equals 1
It calculates the geometric mean using a Fused-Multiply-and-Add (FMA) compensated scheme for accurate computation of floating-point product.
geomean(x)
geomean(x)
x |
a numeric vector containing the sample observations. |
If x
contains any non-positive values, geomean
returns NA
and
a warning message is displayed.
The geometric mean is a measure of central tendency, which is defined as
This procedure calculates the product required in the geometric mean safely using a compensated scheme as proposed by Graillat (2009).
The geometric mean of the sample, a non-negative number.
Graillat, S. (2009). Accurate floating-point product and exponentiation. IEEE Transactions on Computers 58, 994-1000.
Oguita, T., Rump, S.M., Oishi, S. (2005). Accurate sum and dot product. SIAM Journal on Scientific Computing 26, 1955-1988.
set.seed(149) x <- rlnorm(1000) mean(x) # 1.68169 median(x) # 0.99663 geomean(x) # 1.01688
set.seed(149) x <- rlnorm(1000) mean(x) # 1.68169 median(x) # 0.99663 geomean(x) # 1.01688
This function returns the Hadamard or element-wise product of two matrices x
and y
, that have the same dimensions.
hadamard(x, y = x)
hadamard(x, y = x)
x |
a numeric matrix or vector. |
y |
a numeric matrix or vector. |
A matrix with the same dimension of x
(and y
) which corresponds to the
element-by-element product of the two matrices.
Styan, G.P.H. (1973). Hadamard products and multivariate statistical analysis, Linear Algebra and Its Applications 6, 217-240.
x <- matrix(rep(1:10, times = 5), ncol = 5) y <- matrix(rep(1:5, each = 10), ncol = 5) z <- hadamard(x, y) z
x <- matrix(rep(1:10, times = 5), ncol = 5) y <- matrix(rep(1:5, each = 10), ncol = 5) z <- hadamard(x, y) z
Performs large-sample methods for testing equality of correlated variables.
harris.test(x, test = "Wald")
harris.test(x, test = "Wald")
x |
a matrix or data frame. As usual, rows are observations and columns are variables. |
test |
test statistic to be used. One of "Wald" (default), "log", "robust" or "log-robust". |
A list of class 'harris.test' with the following elements:
statistic |
value of the statistic, i.e. the value of either Wald test, using the log-transformation, or distribution-robust versions of the test (robust and log-robust). |
parameter |
the degrees of freedom for the test statistic, which is chi-square distributed. |
p.value |
the p-value for the test. |
estimate |
the estimated covariance matrix. |
method |
a character string indicating what type of test was performed. |
Harris, P. (1985). Testing the variance homogeneity of correlated variables. Biometrika 72, 103-107.
x <- iris[,1:4] z <- harris.test(x, test = "robust") z
x <- iris[,1:4] z <- harris.test(x, test = "robust") z
This function returns the Helmert matrix of order .
helmert(n = 1)
helmert(n = 1)
n |
order of the Helmert matrix. |
A Helmert matrix of order is a square matrix defined as
Helmert matrix is orthogonal and is frequently used in the analysis of variance (ANOVA).
Returns an by
matrix.
Lancaster, H.O. (1965). The Helmert matrices. The American Mathematical Monthly 72, 4-12.
Gentle, J.E. (2007). Matrix Algebra: Theory, Computations, and Applications in Statistics. Springer, New York.
n <- 1000 set.seed(149) x <- rnorm(n) H <- helmert(n) object.size(H) # 7.63 Mb of storage K <- H[2:n,] z <- c(K %*% x) sum(z^2) # 933.1736 # same that (n - 1) * var(x)
n <- 1000 set.seed(149) x <- rnorm(n) H <- helmert(n) object.size(H) # 7.63 Mb of storage K <- H[2:n,] z <- c(K %*% x) sum(z^2) # 933.1736 # same that (n - 1) * var(x)
Returns TRUE
if the given matrix is lower or upper triangular matrix.
is.lower.tri(x, diag = FALSE) is.upper.tri(x, diag = FALSE)
is.lower.tri(x, diag = FALSE) is.upper.tri(x, diag = FALSE)
x |
a matrix of other R object with |
diag |
logical. Should the diagonal be included? |
Check if a matrix is lower or upper triangular. You can also include diagonal to the check.
x <- matrix(rnorm(10 * 3), ncol = 3) R <- chol(crossprod(x)) is.lower.tri(R) is.upper.tri(R)
x <- matrix(rnorm(10 * 3), ncol = 3) R <- chol(crossprod(x)) is.lower.tri(R) is.upper.tri(R)
Jacobi method is an iterative algorithm for solving a system of linear equations.
jacobi(a, b, start, maxiter = 200, tol = 1e-7)
jacobi(a, b, start, maxiter = 200, tol = 1e-7)
a |
a square numeric matrix containing the coefficients of the linear system. |
b |
a vector of right-hand sides of the linear system. |
start |
a vector for initial starting point. |
maxiter |
the maximum number of iterations. Defaults to |
tol |
tolerance level for stopping iterations. |
Let ,
, and
denote the diagonal, lower
triangular and upper triangular parts of a matrix
. Jacobi's method
solve the equation
, iteratively by rewriting
. Assuming that
is nonsingular
leads to the iteration formula
a vector with the approximate solution, the iterations performed are returned as the attribute 'iterations'.
Golub, G.H., Van Loan, C.F. (1996). Matrix Computations, 3rd Edition. John Hopkins University Press.
a <- matrix(c(5,-3,2,-2,9,-1,3,1,-7), ncol = 3) b <- c(-1,2,3) start <- c(1,1,1) z <- jacobi(a, b, start) z # converged in 15 iterations
a <- matrix(c(5,-3,2,-2,9,-1,3,1,-7), ncol = 3) b <- c(-1,2,3) start <- c(1,1,1) z <- jacobi(a, b, start) z # converged in 15 iterations
Performs an omnibus test for univariate normality.
JarqueBera.test(x, test = "DH")
JarqueBera.test(x, test = "DH")
x |
a numeric vector containing the sample observations. |
test |
test statistic to be used. One of "DH" (Doornik-Hansen, the default), "JB" (Jarque-Bera), "robust" (robust modification by Gel and Gastwirth), "ALM" (Adjusted Lagrange multiplier). |
A list of class 'JarqueBera.test' with the following elements:
statistic |
value of the statistic, i.e. the value of either Doornik-Hansen, Jarque-Bera, or Adjusted Lagrange multiplier test. |
parameter |
the degrees of freedom for the test statistic, which is chi-square distributed. |
p.value |
the p-value for the test. |
skewness |
the estimated skewness coefficient. |
kurtosis |
the estimated kurtosis coefficient. |
method |
a character string indicating what type of test was performed. |
Doornik, J.A., Hansen, H. (2008). An omnibus test for univariate and multivariate normality. Oxford Bulletin of Economics and Statistics 70, 927-939.
Gel, Y.R., Gastwirth, J.L. (2008). A robust modification of the Jarque-Bera test of normality. Economics Letters 99, 30-32.
Jarque, C.M., Bera, A.K. (1980). Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Economics Letters 6, 255-259.
Urzua, C.M. (1996). On the correct use of omnibus tests for normality. Economics Letters 53, 247-251.
set.seed(149) x <- rnorm(100) z <- JarqueBera.test(x, test = "DH") z set.seed(173) x <- runif(100) z <- JarqueBera.test(x, test = "DH") z
set.seed(149) x <- rnorm(100) z <- JarqueBera.test(x, test = "DH") z set.seed(173) x <- runif(100) z <- JarqueBera.test(x, test = "DH") z
Computes the kronecker product of two matrices, x
and y
.
kronecker.prod(x, y = x)
kronecker.prod(x, y = x)
x |
a numeric matrix or vector. |
y |
a numeric matrix or vector. |
Let be an
and
a
matrix.
The
matrix defined by
is called the Kronecker product of and
.
An array with dimensions dim(x) * dim(y)
.
Magnus, J.R., Neudecker, H. (2007). Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd Edition. Wiley, New York.
kronecker
function from base
package is based on outer
.
Our C
version is slightly faster.
# block diagonal matrix: a <- diag(1:3) b <- matrix(1:4, ncol = 2) kronecker.prod(a, b) # examples with vectors ones <- rep(1, 4) y <- 1:3 kronecker.prod(ones, y) # 12-dimensional vector kronecker.prod(ones, t(y)) # 3 x 3 matrix
# block diagonal matrix: a <- diag(1:3) b <- matrix(1:4, ncol = 2) kronecker.prod(a, b) # examples with vectors ones <- rep(1, 4) y <- 1:3 kronecker.prod(ones, y) # 12-dimensional vector kronecker.prod(ones, t(y)) # 3 x 3 matrix
Given an
by
real matrix and an
-vector
,
this function constructs the Krylov matrix
, where
krylov(a, b, m = ncol(a))
krylov(a, b, m = ncol(a))
a |
a numeric square matrix of order |
b |
a numeric vector of length |
m |
length of the Krylov sequence. |
Returns an by
matrix.
a <- matrix(c(1, 3, 2, -5, 1, 7, 1, 5, -4), ncol = 3, byrow = TRUE) b <- c(1, 1, 1) k <- krylov(a, b, m = 4) k
a <- matrix(c(1, 3, 2, -5, 1, 7, 1, 5, -4), ncol = 3, byrow = TRUE) b <- c(1, 1, 1) k <- krylov(a, b, m = 4) k
Functions to compute measures of multivariate skewness and kurtosis
proposed by Mardia (1970),
and
kurtosis(x) skewness(x)
kurtosis(x) skewness(x)
x |
matrix of data with, say, |
Mardia, K.V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 519-530.
Mardia, K.V., Zemroch, P.J. (1975). Algorithm AS 84: Measures of multivariate skewness and kurtosis. Applied Statistics 24, 262-265.
setosa <- iris[1:50,1:4] kurtosis(setosa) skewness(setosa)
setosa <- iris[1:50,1:4] kurtosis(setosa) skewness(setosa)
Compute the LDL decomposition of a real symmetric matrix.
ldl(x)
ldl(x)
x |
a symmetric numeric matrix whose LDL decomposition is to be computed. |
The factorization has the form , where
is a diagonal matrix and
is unitary lower triangular.
The LDL decomposition of is returned as a list with components:
lower |
the unitary lower triangular factor |
d |
a vector containing the diagonal elements of |
Golub, G.H., Van Loan, C.F. (1996). Matrix Computations, 3rd Edition. John Hopkins University Press.
a <- matrix(c(2,-1,0,-1,2,-1,0,-1,1), ncol = 3) z <- ldl(a) z # information of LDL factorization # computing det(a) prod(z$d) # product of diagonal elements of D # a non-positive-definite matrix m <- matrix(c(5,-5,-5,3), ncol = 2) try(chol(m)) # fails ldl(m)
a <- matrix(c(2,-1,0,-1,2,-1,0,-1,1), ncol = 3) z <- ldl(a) z # information of LDL factorization # computing det(a) prod(z$d) # product of diagonal elements of D # a non-positive-definite matrix m <- matrix(c(5,-5,-5,3), ncol = 2) try(chol(m)) # fails ldl(m)
lu
computes the LU factorization of a matrix.
lu(x) ## Default S3 method: lu(x) ## S3 method for class 'lu' solve(a, b, ...) is.lu(x)
lu(x) ## Default S3 method: lu(x) ## S3 method for class 'lu' solve(a, b, ...) is.lu(x)
x |
a square numeric matrix whose LU factorization is to be computed. |
a |
an LU factorization of a square matrix. |
b |
a vector or matrix of right-hand sides of equations. |
... |
further arguments passed to or from other methods |
The LU factorization plays an important role in many numerical procedures. In
particular it is the basic method to solve the equation
for given matrix
, and vector
.
solve.lu
is the method for solve
for lu
objects.
is.lu
returns TRUE
if x
is a list
and inherits
from "lu"
.
Unsuccessful results from the underlying LAPACK code will result in an
error giving a positive error code: these can only be interpreted by
detailed study of the Fortran
code.
The LU factorization of the matrix as computed by LAPACK. The components in
the returned value correspond directly to the values returned by DGETRF
.
lu |
a matrix with the same dimensions as |
pivot |
information on the pivoting strategy used during the factorization. |
To compute the determinant of a matrix (do you really need it?),
the LU factorization is much more efficient than using eigenvalues
(eigen
). See det
.
LAPACK uses column pivoting and does not attempt to detect rank-deficient matrices.
Anderson. E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A. Sorensen, D. (1999). LAPACK Users' Guide, 3rd Edition. SIAM.
Golub, G.H., Van Loan, C.F. (1996). Matrix Computations, 3rd Edition. John Hopkins University Press.
extractL
, extractU
, constructX
for
reconstruction of the matrices,
lu2inv
a <- matrix(c(3,2,6,17,4,18,10,-2,-12), ncol = 3) z <- lu(a) z # information of LU factorization # computing det(a) prod(diag(z$lu)) # product of diagonal elements of U # solve linear equations b <- matrix(1:6, ncol = 2) solve(z, b)
a <- matrix(c(3,2,6,17,4,18,10,-2,-12), ncol = 3) z <- lu(a) z # information of LU factorization # computing det(a) prod(diag(z$lu)) # product of diagonal elements of U # solve linear equations b <- matrix(1:6, ncol = 2) solve(z, b)
Returns the original matrix from which the object was constructed or the components of the factorization.
constructX(x) extractL(x) extractU(x)
constructX(x) extractL(x) extractU(x)
x |
object representing an LU factorization. This will typically have
come from a previous call to |
constructX
returns , the original matrix from which the
lu
object was constructed (because of the pivoting the matrix is not exactly
the product between
and
).
extractL
returns . This may be pivoted.
extractU
returns .
lu
.
a <- matrix(c(10,-3,5,-7,2,-1,0,6,5), ncol = 3) z <- lu(a) L <- extractL(z) L U <- extractU(z) U X <- constructX(z) all(a == X)
a <- matrix(c(10,-3,5,-7,2,-1,0,6,5), ncol = 3) z <- lu(a) L <- extractL(z) L U <- extractU(z) U X <- constructX(z) all(a == X)
Invert a square matrix from its LU factorization.
lu2inv(x)
lu2inv(x)
x |
object representing an LU factorization. This will typically have
come from a previous call to |
The inverse of the matrix whose LU factorization was given.
Unsuccessful results from the underlying LAPACK code will result in an
error giving a positive error code: these can only be interpreted by
detailed study of the Fortran
code.
This is an interface to the LAPACK routine DGETRI
. LAPACK is from
https://netlib.org/lapack/ and its guide is listed in the references.
Anderson. E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A. Sorensen, D. (1999). LAPACK Users' Guide, 3rd Edition. SIAM.
Golub, G.H., Van Loan, C.F. (1996). Matrix Computations, 3rd Edition. John Hopkins University Press.
a <- matrix(c(3,2,6,17,4,18,10,-2,-12), ncol = 3) z <- lu(a) a %*% lu2inv(z)
a <- matrix(c(3,2,6,17,4,18,10,-2,-12), ncol = 3) z <- lu(a) a %*% lu2inv(z)
Returns the squared Mahalanobis distance of all rows in and the
vector
=
center
with respect to =
cov
.
This is (for vector ) defined as
Mahalanobis(x, center, cov, inverted = FALSE)
Mahalanobis(x, center, cov, inverted = FALSE)
x |
vector or matrix of data. As usual, rows are observations and columns are variables. |
center |
mean vector of the distribution. |
cov |
covariance matrix ( |
inverted |
logical. If |
Unlike function mahalanobis
, the covariance matrix is factorized using the
Cholesky decomposition, which allows to assess if cov
is positive definite.
Unsuccessful results from the underlying LAPACK code will result in an error message.
x <- cbind(1:6, 1:3) xbar <- colMeans(x) S <- matrix(c(1,4,4,1), ncol = 2) # is negative definite D2 <- mahalanobis(x, center = xbar, S) all(D2 >= 0) # several distances are negative ## next command produces the following error: ## Covariance matrix is possibly not positive-definite ## Not run: D2 <- Mahalanobis(x, center = xbar, S)
x <- cbind(1:6, 1:3) xbar <- colMeans(x) S <- matrix(c(1,4,4,1), ncol = 2) # is negative definite D2 <- mahalanobis(x, center = xbar, S) all(D2 >= 0) # several distances are negative ## next command produces the following error: ## Covariance matrix is possibly not positive-definite ## Not run: D2 <- Mahalanobis(x, center = xbar, S)
Computes the inner product between two rectangular matrices calling BLAS.
matrix.inner(x, y = x)
matrix.inner(x, y = x)
x |
a numeric matrix. |
y |
a numeric matrix. |
a real value, indicating the inner product between two matrices.
x <- matrix(c(1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 1,-1, 1, 2,-1, 1, 3,-1), ncol = 3, byrow = TRUE) y <- matrix(1, nrow = 6, ncol = 3) matrix.inner(x, y) # must be equal matrix.norm(x, type = "Frobenius")^2 matrix.inner(x)
x <- matrix(c(1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 1,-1, 1, 2,-1, 1, 3,-1), ncol = 3, byrow = TRUE) y <- matrix(1, nrow = 6, ncol = 3) matrix.inner(x, y) # must be equal matrix.norm(x, type = "Frobenius")^2 matrix.inner(x)
Computes a matrix norm of x
using LAPACK. The norm can be the one ("1"
)
norm, the infinity ("inf"
) norm, the Frobenius norm, the maximum modulus
("maximum"
) among elements of a matrix, as determined by the value of type
.
matrix.norm(x, type = "Frobenius")
matrix.norm(x, type = "Frobenius")
x |
a numeric matrix. |
type |
character string, specifying the type of matrix norm to be computed. A character indicating the type of norm desired.
|
As function norm
in package base, method of matrix.norm
calls
the LAPACK function DLANGE
.
Note that the 1-, Inf
- and maximum norm is faster to calculate than the Frobenius one.
The matrix norm, a non-negative number.
# a tiny example x <- matrix(c(1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 1,-1, 1, 2,-1, 1, 3,-1), ncol = 3, byrow = TRUE) matrix.norm(x, type = "Frobenius") matrix.norm(x, type = "1") matrix.norm(x, type = "Inf") # an example not that small n <- 1000 x <- .5 * diag(n) + 0.5 * matrix(1, nrow = n, ncol = n) matrix.norm(x, type = "Frobenius") matrix.norm(x, type = "1") matrix.norm(x, type = "Inf") matrix.norm(x, type = "maximum") # equal to 1
# a tiny example x <- matrix(c(1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 1,-1, 1, 2,-1, 1, 3,-1), ncol = 3, byrow = TRUE) matrix.norm(x, type = "Frobenius") matrix.norm(x, type = "1") matrix.norm(x, type = "Inf") # an example not that small n <- 1000 x <- .5 * diag(n) + 0.5 * matrix(1, nrow = n, ncol = n) matrix.norm(x, type = "Frobenius") matrix.norm(x, type = "1") matrix.norm(x, type = "Inf") matrix.norm(x, type = "maximum") # equal to 1
Compute the Cholesky factorization of a real symmetric but not necessarily positive definite matrix.
mchol(x)
mchol(x)
x |
a symmetric but not necessarily positive definite matrix to be factored. |
The lower triangular factor of modified Cholesky decomposition, i.e., the matrix
such that
, where
is a nonnegative diagonal matrix that is zero if
es sufficiently positive
definite.
Gill, P.E., Murray, W., Wright, M.H. (1981). Practical Optimization. Academic Press, London.
Nocedal, J., Wright, S.J. (1999). Numerical Optimization. Springer, New York.
# a non-positive-definite matrix a <- matrix(c(4,2,1,2,6,3,1,3,-.004), ncol = 3) try(chol(a)) # fails z <- mchol(a) z # triangular factor # modified 'a' matrix tcrossprod(z)
# a non-positive-definite matrix a <- matrix(c(4,2,1,2,6,3,1,3,-.004), ncol = 3) try(chol(a)) # fails z <- mchol(a) z # triangular factor # modified 'a' matrix tcrossprod(z)
It calculates the mediancenter (or geometric median) of multivariate data.
mediancenter(x)
mediancenter(x)
x |
a matrix or data frame. As usual, rows are observations and columns are variables. |
The mediancenter for a sample of multivariate observations is computed using a steepest descend method combined with bisection. The mediancenter invariant to rotations of axes and is useful as a multivariate generalization of the median of univariate sample.
A list containing the following named components:
median |
an estimate for the mediancenter of the data. |
iter |
the number of iterations performed, it is negative if a degenerate solution is found. |
Gower, J.C. (1974). Algorithm AS 78: The mediancentre. Applied Statistics 23, 466-470.
x <- cbind(1:10, c(1:3, 8:5, 8:10)) z <- mediancenter(x)$median # degenerate solution xbar <- colMeans(x) plot(x, xlab = "", ylab = "") points(x = xbar[1], y = xbar[2], pch = 16, col = "red") points(x = z[1], y = z[2], pch = 3, col = "blue", lwd = 2)
x <- cbind(1:10, c(1:3, 8:5, 8:10)) z <- mediancenter(x)$median # degenerate solution xbar <- colMeans(x) plot(x, xlab = "", ylab = "") points(x = xbar[1], y = xbar[2], pch = 16, col = "red") points(x = z[1], y = z[2], pch = 3, col = "blue", lwd = 2)
Computes a -norm of vector
. The norm can be the one (
)
norm, Euclidean (
) norm, the infinity (
=
Inf
) norm. The underlying
C
or Fortran
code is inspired on ideas of BLAS Level 1.
minkowski(x, p = 2)
minkowski(x, p = 2)
x |
a numeric vector. |
p |
a number, specifying the type of norm desired. Possible values include
real number greater or equal to 1, or Inf, Default value is |
Method of minkowski
for =
Inf
calls idamax
BLAS function.
For other values, C
or Fortran
subroutines using unrolled cycles are
called.
The vector -norm, a non-negative number.
# a tiny example x <- rnorm(1000) minkowski(x, p = 1) minkowski(x, p = 1.5) minkowski(x, p = 2) minkowski(x, p = Inf) x <- x / minkowski(x) minkowski(x, p = 2) # equal to 1
# a tiny example x <- rnorm(1000) minkowski(x, p = 1) minkowski(x, p = 1.5) minkowski(x, p = 2) minkowski(x, p = Inf) x <- x / minkowski(x) minkowski(x, p = 2) # equal to 1
It calculates up to fourth central moments (or moments about the mean), and the skewness and kurtosis coefficients using an online algorithm.
moments(x)
moments(x)
x |
a numeric vector containing the sample observations. |
The -th central moment is defined as
In particular, the second central moment is the variance of the sample. The sample skewness and kurtosis are defined, respectively, as
A list containing second
, third
and fourth
central moments,
and skewness
and kurtosis
coefficients.
Spicer, C.C. (1972). Algorithm AS 52: Calculation of power sums of deviations about the mean. Applied Statistics 21, 226-227.
var
.
set.seed(149) x <- rnorm(1000) z <- moments(x) z
set.seed(149) x <- rnorm(1000) z <- moments(x) z
Returns an object of class "ols"
that represents a linear model fit.
ols(formula, data, subset, na.action, method = "qr", tol = 1e-7, maxiter = 100, x = FALSE, y = FALSE, contrasts = NULL, ...)
ols(formula, data, subset, na.action, method = "qr", tol = 1e-7, maxiter = 100, x = FALSE, y = FALSE, contrasts = NULL, ...)
formula |
an object of class |
data |
an optional data frame, list or environment (or object coercible
by |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data
contain |
method |
the least squares fitting method to be used; the options are |
tol |
tolerance for the conjugate gradients ( |
maxiter |
The maximum number of iterations for the conjugate gradients ( |
x , y
|
logicals. If |
contrasts |
an optional list. See the |
... |
additional arguments (currently disregarded). |
ols
returns an object of class
"ols"
.
The function summary
is used to obtain and print a summary of the
results. The generic accessor functions coefficients
, fitted.values
and residuals
extract various useful features of the value returned by ols
.
An object of class "ols"
is a list containing at least the
following components:
coefficients |
a named vector of coefficients |
residuals |
the residuals, that is response minus fitted values. |
fitted.values |
the fitted mean values. |
RSS |
the residual sum of squares. |
cov.unscaled |
a |
call |
the matched call. |
terms |
the |
contrasts |
(only where relevant) the contrasts used. |
y |
if requested, the response used. |
x |
if requested, the model matrix used. |
model |
if requested (the default), the model frame used. |
# tiny example of regression y <- c(1, 3, 3, 2, 2, 1) x <- matrix(c(1, 1, 2, 1, 3, 1, 1,-1, 2,-1, 3,-1), ncol = 2, byrow = TRUE) f0 <- ols(y ~ x) # intercept is included by default f0 # printing results (QR method was used) f1 <- ols(y ~ x, method = "svd") # using SVD method instead f1
# tiny example of regression y <- c(1, 3, 3, 2, 2, 1) x <- matrix(c(1, 1, 2, 1, 3, 1, 1,-1, 2,-1, 3,-1), ncol = 2, byrow = TRUE) f0 <- ols(y ~ x) # intercept is included by default f0 # printing results (QR method was used) f1 <- ols(y ~ x, method = "svd") # using SVD method instead f1
This function is a switcher among various numerical fitting functions
(ols.fit.cg
, ols.fit.chol
, ols.fit.qr
,
ols.fit.svd
and ols.fit.sweep
). The argument method
does the switching: "qr"
for ols.fit.qr
, etc. This should usually
not be used directly unless by experienced users.
ols.fit(x, y, method = "qr", tol = 1e-7, maxiter = 100)
ols.fit(x, y, method = "qr", tol = 1e-7, maxiter = 100)
x |
design matrix of dimension |
y |
vector of observations of length |
method |
currently, methods |
tol |
tolerance for the conjugate gradients ( |
maxiter |
The maximum number of iterations for the conjugate gradients ( |
a list
with components:
coefficients |
a named vector of coefficients |
residuals |
the residuals, that is response minus fitted values. |
fitted.values |
the fitted mean values. |
RSS |
the residual sum of squares. |
cov.unscaled |
a |
ols.fit.cg
, ols.fit.chol
, ols.fit.qr
,
ols.fit.svd
, ols.fit.sweep
.
set.seed(151) n <- 100 p <- 2 x <- matrix(rnorm(n * p), n, p) # no intercept! y <- rnorm(n) fm <- ols.fit(x = x, y = y, method = "chol") fm
set.seed(151) n <- 100 p <- 2 x <- matrix(rnorm(n * p), n, p) # no intercept! y <- rnorm(n) fm <- ols.fit(x = x, y = y, method = "chol") fm
Fits a linear model, returning the bare minimum computations.
ols.fit.cg(x, y, tol = 1e-7, maxiter = 100) ols.fit.chol(x, y) ols.fit.qr(x, y) ols.fit.svd(x, y) ols.fit.sweep(x, y)
ols.fit.cg(x, y, tol = 1e-7, maxiter = 100) ols.fit.chol(x, y) ols.fit.qr(x, y) ols.fit.svd(x, y) ols.fit.sweep(x, y)
x , y
|
numeric vectors or matrices for the predictors and the response in
a linear model. Typically, but not necessarily, |
tol |
tolerance for the conjugate gradients ( |
maxiter |
The maximum number of iterations for the conjugate gradients ( |
The bare bones of an ols
object: the coefficients, residuals, fitted values,
and some information used by summary.ols
.
set.seed(151) n <- 100 p <- 2 x <- matrix(rnorm(n * p), n, p) # no intercept! y <- rnorm(n) z <- ols.fit.chol(x, y) z
set.seed(151) n <- 100 p <- 2 x <- matrix(rnorm(n * p), n, p) # no intercept! y <- rnorm(n) z <- ols.fit.chol(x, y) z
The power method seeks to determine the eigenvalue of maximum modulus, and a corresponding eigenvector.
power.method(x, only.value = FALSE, maxiter = 100, tol = 1e-8)
power.method(x, only.value = FALSE, maxiter = 100, tol = 1e-8)
x |
a symmetric matrix. |
only.value |
if |
maxiter |
the maximum number of iterations. Defaults to |
tol |
a numeric tolerance. |
When only.value
is not true, as by default, the result is a list with components
"value"
and "vector"
. Otherwise only the dominan eigenvalue is returned.
The performed number of iterations to reach convergence is returned as attribute "iterations"
.
eigen
for eigenvalues and eigenvectors computation.
n <- 1000 x <- .5 * diag(n) + 0.5 * matrix(1, nrow = n, ncol = n) # dominant eigenvalue must be (n + 1) / 2 z <- power.method(x, only.value = TRUE)
n <- 1000 x <- .5 * diag(n) + 0.5 * matrix(1, nrow = n, ncol = n) # dominant eigenvalue must be (n + 1) / 2 z <- power.method(x, only.value = TRUE)
Random vector generation uniformly in the unitary ball.
rball(n = 1, p = 2)
rball(n = 1, p = 2)
n |
the number of samples requested |
p |
dimension of the unitary sphere |
The function rball
is an interface to C routines, which make calls to
subroutines from BLAS.
If n = 1
a p
-dimensional vector, otherwise a matrix of n
rows of random vectors.
Hormann, W., Leydold, J., Derflinger, G. (2004). Automatic Nonuniform Random Variate Generation. Springer, New York.
# generate the sample z <- rball(n = 500) # scatterplot of a random sample of 500 points uniformly distributed # in the unitary ball par(pty = "s") plot(z, xlab = "x", ylab = "y") title("500 points in the ball", font.main = 1)
# generate the sample z <- rball(n = 500) # scatterplot of a random sample of 500 points uniformly distributed # in the unitary ball par(pty = "s") plot(z, xlab = "x", ylab = "y") title("500 points in the ball", font.main = 1)
Fit a linear model by ridge regression, returning an object of class "ridge"
.
ridge(formula, data, subset, lambda = 1.0, method = "GCV", ngrid = 200, tol = 1e-07, maxiter = 50, na.action, x = FALSE, y = FALSE, contrasts = NULL, ...)
ridge(formula, data, subset, lambda = 1.0, method = "GCV", ngrid = 200, tol = 1e-07, maxiter = 50, na.action, x = FALSE, y = FALSE, contrasts = NULL, ...)
formula |
an object of class |
data |
an optional data frame, list or environment (or object coercible
by |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data
contain |
lambda |
a scalar or vector of ridge constants. A value of 0 corresponds to ordinary least squares. |
method |
the method for choosing the ridge parameter lambda. If |
ngrid |
number of elements in the grid used to compute the GCV criterion.
Only required if |
tol |
tolerance for the optimization of the GCV criterion. Default is |
maxiter |
maximum number of iterations. The default is 50. |
x , y
|
logicals. If |
contrasts |
an optional list. See the |
... |
additional arguments to be passed to the low level regression fitting functions (not implemented). |
ridge
function fits in linear ridge regression without scaling or centering
the regressors and the response. In addition, If an intercept is present in the model, its
coefficient is penalized.)
A list with the following components:
dims |
dimensions of model matrix. |
coefficients |
a named vector of coefficients. |
scale |
a named vector of coefficients. |
fitted.values |
the fitted mean values. |
residuals |
the residuals, that is response minus fitted values. |
RSS |
the residual sum of squares. |
edf |
the effective number of parameters. |
GCV |
vector (if |
HKB |
HKB estimate of the ridge constant. |
LW |
LW estimate of the ridge constant. |
lambda |
vector (if |
optimal |
value of lambda with the minimum GCV (only relevant if |
iterations |
number of iterations performed by the algorithm (only relevant if |
call |
the matched call. |
terms |
the |
contrasts |
(only where relevant) the contrasts used. |
y |
if requested, the response used. |
x |
if requested, the model matrix used. |
model |
if requested, the model frame used. |
Golub, G.H., Heath, M., Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21, 215-223.
Hoerl, A.E., Kennard, R.W., Baldwin, K.F. (1975). Ridge regression: Some simulations. Communication in Statistics 4, 105-123.
Hoerl, A.E., Kennard, R.W. (1970). Ridge regression: Biased estimation of nonorthogonal problems. Technometrics 12, 55-67.
Lawless, J.F., Wang, P. (1976). A simulation study of ridge and other regression estimators. Communications in Statistics 5, 307-323.
Lee, T.S (1987). Algorithm AS 223: Optimum ridge parameter selection. Applied Statistics 36, 112-118.
z <- ridge(GNP.deflator ~ ., data = longley, lambda = 4, method = "grid") z # ridge regression on a grid over seq(0, 4, length = 200) z <- ridge(GNP.deflator ~ ., data = longley) z # ridge parameter selected using GCV (default)
z <- ridge(GNP.deflator ~ ., data = longley, lambda = 4, method = "grid") z # ridge regression on a grid over seq(0, 4, length = 200) z <- ridge(GNP.deflator ~ ., data = longley) z # ridge parameter selected using GCV (default)
Random number generation from the multivariate normal (Gaussian) distribution.
rmnorm(n = 1, mean = rep(0, nrow(Sigma)), Sigma = diag(length(mean)))
rmnorm(n = 1, mean = rep(0, nrow(Sigma)), Sigma = diag(length(mean)))
n |
the number of samples requested |
mean |
a vector giving the means of each variable |
Sigma |
a positive-definite covariance matrix |
The function rmnorm
is an interface to C
routines, which make calls
to subroutines from LAPACK. The matrix decomposition is internally done using the
Cholesky decomposition. If Sigma
is not non-negative definite then there
will be a warning message.
If a vector of the same length as
mean
, otherwise a
matrix of rows of random vectors.
Devroye, L. (1986). Non-Uniform Random Variate Generation. Springer-Verlag, New York.
# covariance parameters Sigma <- matrix(c(10,3,3,2), ncol = 2) Sigma # generate the sample y <- rmnorm(n = 1000, Sigma = Sigma) var(y) # scatterplot of a random bivariate normal sample with mean # vector zero and covariance matrix 'Sigma' par(pty = "s") plot(y, xlab = "", ylab = "") title("bivariate normal sample", font.main = 1) # QQ-plot of transformed distances z <- WH.normal(y) par(pty = "s") qqnorm(z, main = "Transformed distances QQ-plot") abline(c(0,1), col = "red", lwd = 2, lty = 2)
# covariance parameters Sigma <- matrix(c(10,3,3,2), ncol = 2) Sigma # generate the sample y <- rmnorm(n = 1000, Sigma = Sigma) var(y) # scatterplot of a random bivariate normal sample with mean # vector zero and covariance matrix 'Sigma' par(pty = "s") plot(y, xlab = "", ylab = "") title("bivariate normal sample", font.main = 1) # QQ-plot of transformed distances z <- WH.normal(y) par(pty = "s") qqnorm(z, main = "Transformed distances QQ-plot") abline(c(0,1), col = "red", lwd = 2, lty = 2)
Random vector generation uniformly on the sphere.
rsphere(n = 1, p = 2)
rsphere(n = 1, p = 2)
n |
the number of samples requested |
p |
dimension of the unitary sphere |
The function rsphere
is an interface to C routines, which make calls to
subroutines from BLAS.
If n = 1
a p
-dimensional vector, otherwise a matrix of n
rows of random vectors.
Devroye, L. (1986). Non-Uniform Random Variate Generation. Springer-Verlag, New York.
# generate the sample z <- rsphere(n = 200) # scatterplot of a random sample of 200 points uniformly distributed # on the unit circle par(pty = "s") plot(z, xlab = "x", ylab = "y") title("200 points on the circle", font.main = 1)
# generate the sample z <- rsphere(n = 200) # scatterplot of a random sample of 200 points uniformly distributed # on the unit circle par(pty = "s") plot(z, xlab = "x", ylab = "y") title("200 points on the circle", font.main = 1)
Compute the scaled condition number of a rectangular matrix.
scaled.condition(x, scales = FALSE)
scaled.condition(x, scales = FALSE)
x |
a numeric rectangular matrix. |
scales |
a logical value indicating whether the scaling factors that allow balancing
the columns of |
The columns of a rectangular matrix x
are equilibrated (but not centered),
then the scaled condition number is computed following the guidelines of Belsley (1990).
If requested, the column scalings are returned as the attribute 'scales'
.
Belsley, D.A. (1990). Conditioning Diagnostics: Collinearity and Weak Data in Regression. Wiley, New York.
x <- matrix(c(1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 1,-1, 1, 2,-1, 1, 3,-1), ncol = 3, byrow = TRUE) scaled.condition(x)
x <- matrix(c(1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 1,-1, 1, 2,-1, 1, 3,-1), ncol = 3, byrow = TRUE) scaled.condition(x)
Gauss-Seidel method is an iterative algorithm for solving a system of linear equations.
seidel(a, b, start, maxiter = 200, tol = 1e-7)
seidel(a, b, start, maxiter = 200, tol = 1e-7)
a |
a square numeric matrix containing the coefficients of the linear system. |
b |
a vector of right-hand sides of the linear system. |
start |
a vector for initial starting point. |
maxiter |
the maximum number of iterations. Defaults to |
tol |
tolerance level for stopping iterations. |
Let ,
, and
denote the diagonal, lower
triangular and upper triangular parts of a matrix
. Gauss-Seidel method
solve the equation
, iteratively by rewriting
. Assuming that
is
nonsingular leads to the iteration formula
a vector with the approximate solution, the iterations performed are returned as the attribute 'iterations'.
Golub, G.H., Van Loan, C.F. (1996). Matrix Computations, 3rd Edition. John Hopkins University Press.
a <- matrix(c(5,-3,2,-2,9,-1,3,1,-7), ncol = 3) b <- c(-1,2,3) start <- c(1,1,1) z <- seidel(a, b, start) z # converged in 10 iterations
a <- matrix(c(5,-3,2,-2,9,-1,3,1,-7), ncol = 3) b <- c(-1,2,3) start <- c(1,1,1) z <- seidel(a, b, start) z # converged in 10 iterations
The Sherman-Morrison formula gives a convenient expression for the inverse of the
rank 1 update where
is a
matrix and
,
are
-dimensional vectors. Thus
sherman.morrison(a, b, d = b, inverted = FALSE)
sherman.morrison(a, b, d = b, inverted = FALSE)
a |
a numeric matrix. |
b |
a numeric vector. |
d |
a numeric vector. |
inverted |
logical. If |
Method of sherman.morrison
calls BLAS level 2 subroutines DGEMV
and
DGER
for computational efficiency.
a square matrix of the same order as a
.
n <- 10 ones <- rep(1, n) a <- 0.5 * diag(n) z <- sherman.morrison(a, ones, 0.5 * ones) z
n <- 10 ones <- rep(1, n) a <- 0.5 * diag(n) z <- sherman.morrison(a, ones, 0.5 * ones) z
Perform the sweep operation (or reverse sweep) on the diagonal elements of a symmetric matrix.
sweep.operator(x, k = 1, reverse = FALSE)
sweep.operator(x, k = 1, reverse = FALSE)
x |
a symmetric matrix. |
k |
elements (if |
reverse |
logical. If |
The symmetric sweep operator is a powerful tool in computational statistics with uses in stepwise regression, conditional multivariate normal distributions, MANOVA, and more.
a square matrix of the same order as x
.
Goodnight, J.H. (1979). A tutorial on the SWEEP operator. The American Statistician 33, 149-158.
# tiny example of regression, last column contains 'y' xy <- matrix(c(1, 1, 1, 1, 1, 2, 1, 3, 1, 3, 1, 3, 1, 1,-1, 2, 1, 2,-1, 2, 1, 3,-1, 1), ncol = 4, byrow = TRUE) z <- crossprod(xy) z <- sweep.operator(z, k = 1:3) cf <- z[1:3,4] # regression coefficients RSS <- z[4,4] # residual sum of squares # an example not that small x <- matrix(rnorm(1000 * 100), ncol = 100) xx <- crossprod(x) z <- sweep.operator(xx, k = 1)
# tiny example of regression, last column contains 'y' xy <- matrix(c(1, 1, 1, 1, 1, 2, 1, 3, 1, 3, 1, 3, 1, 1,-1, 2, 1, 2,-1, 2, 1, 3,-1, 1), ncol = 4, byrow = TRUE) z <- crossprod(xy) z <- sweep.operator(z, k = 1:3) cf <- z[1:3,4] # regression coefficients RSS <- z[4,4] # residual sum of squares # an example not that small x <- matrix(rnorm(1000 * 100), ncol = 100) xx <- crossprod(x) z <- sweep.operator(xx, k = 1)
This function provides the information required to create the symmetrizer matrix.
symm.info(n = 1)
symm.info(n = 1)
n |
order of the symmetrizer matrix. |
This function returns a list containing vectors that represent an element of the
symmetrizer matrix and is accesed by the indexes in vectors row
, col
and values contained in val
. This information is used by function symm.prod
to do some operations involving the symmetrizer matrix without forming it. This
information also can be obtained using function symmetrizer
.
A list containing the following elements:
row |
vector of indexes, each entry represents the row index of the symmetrizer matrix. |
col |
vector of indexes, each entry represents the column index of the symmetrizer matrix. |
val |
vector of values, each entry represents the value of the symmetrizer matrix
at element given by |
order |
order of the symmetrizer matrix. |
z <- symm.info(n = 3) z # elements in symmetrizer matrix of order 3 N3 <- symmetrizer(n = 3, matrix = TRUE) N3 # only recommended if n is very small
z <- symm.info(n = 3) z # elements in symmetrizer matrix of order 3 N3 <- symmetrizer(n = 3, matrix = TRUE) N3 # only recommended if n is very small
Given the order of a symmetrizer matrix of order
and a
conformable matrix
, performs one of the matrix-matrix operations:
, if
side = "left"
, or
, if
side = "right"
,
The main aim of symm.prod
is to do this matrix multiplication without forming
the symmetrizer matrix.
symm.prod(n = 1, x = NULL, side = "left")
symm.prod(n = 1, x = NULL, side = "left")
n |
order of the symmetrizer matrix. |
x |
numeric matrix (or vector). |
side |
a string selecting if symmetrizer matrix is pre-multiplying |
Underlying C
code only uses information provided by symm.info
to
performs the matrix multiplication. The symmetrizer matrix is never created.
N4 <- symmetrizer(n = 4, matrix = TRUE) x <- matrix(1:32, ncol = 2) y <- N4 %*% x z <- symm.prod(n = 4, x) # N4 is not stored all(z == y) # matrices y and z are equal!
N4 <- symmetrizer(n = 4, matrix = TRUE) x <- matrix(1:32, ncol = 2) y <- N4 %*% x z <- symm.prod(n = 4, x) # N4 is not stored all(z == y) # matrices y and z are equal!
This function returns the symmetrizer matrix of order which transforms,
for every
matrix
,
vec
into
vec
.
symmetrizer(n = 1, matrix = FALSE)
symmetrizer(n = 1, matrix = FALSE)
n |
order of the symmetrizer matrix. |
matrix |
a logical indicating whether the symmetrizer matrix will be returned. |
This function is a wrapper function for the function symm.info
. This function
provides the information required to create the symmetrizer matrix. If option matrix = FALSE
the symmetrizer matrix is stored in three vectors containing the coordinate list of
indexes for rows, columns and the values.
Warning: matrix = TRUE
is not recommended, unless the order n
be small. This matrix can require a huge amount of storage.
Returns an by
matrix (if requested).
Abadir, K.M., Magnus, J.R. (2005). Matrix Algebra. Cambridge University Press.
Magnus, J.R., Neudecker, H. (2007). Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd Edition. Wiley, New York.
z <- symmetrizer(n = 100) object.size(z) # 319 Kb of storage N100 <- symmetrizer(n = 100, matrix = TRUE) # time: < 2 secs object.size(N100) # 800 Mb of storage, do not request this matrix! # a small example N3 <- symmetrizer(n = 3, matrix = TRUE) a <- matrix(rep(c(2,4,6), each = 3), ncol = 3) a b <- 0.5 * (a + t(a)) b v <- N3 %*% vec(a) all(vec(b) == as.vector(v)) # vectors are equal!
z <- symmetrizer(n = 100) object.size(z) # 319 Kb of storage N100 <- symmetrizer(n = 100, matrix = TRUE) # time: < 2 secs object.size(N100) # 800 Mb of storage, do not request this matrix! # a small example N3 <- symmetrizer(n = 3, matrix = TRUE) a <- matrix(rep(c(2,4,6), each = 3), ncol = 3) a b <- 0.5 * (a + t(a)) b v <- N3 %*% vec(a) all(vec(b) == as.vector(v)) # vectors are equal!
This function returns a vector obtained by stacking the columns of .
vec(x)
vec(x)
x |
a numeric matrix. |
Let be a
by
matrix, then
vec
()
is a
-dimensional vector.
x <- matrix(rep(1:10, each = 10), ncol = 10) x y <- vec(x) y
x <- matrix(rep(1:10, each = 10), ncol = 10) x y <- vec(x) y
This function returns a vector obtained by stacking the lower triangular part of a square matrix.
vech(x)
vech(x)
x |
a square matrix. |
Let be a
by
matrix, then
vech
()
is a
-dimensional vector.
x <- matrix(rep(1:10, each = 10), ncol = 10) x y <- vech(x) y
x <- matrix(rep(1:10, each = 10), ncol = 10) x y <- vech(x) y
Returns the Wilson-Hilferty transformation of random variables with chi-squared distribution.
WH.normal(x)
WH.normal(x)
x |
vector or matrix of data with, say, |
Let be a random variable, where
denotes the squared Mahalanobis
distance defined as
Thus the Wilson-Hilferty transformation is given by
and is approximately distributed as a standard normal distribution. This
is useful, for instance, in the construction of QQ-plots.
Wilson, E.B., and Hilferty, M.M. (1931). The distribution of chi-square. Proceedings of the National Academy of Sciences of the United States of America 17, 684-688.
x <- iris[,1:4] z <- WH.normal(x) par(pty = "s") qqnorm(z, main = "Transformed distances QQ-plot") abline(c(0,1), col = "red", lwd = 2, lty = 2)
x <- iris[,1:4] z <- WH.normal(x) par(pty = "s") qqnorm(z, main = "Transformed distances QQ-plot") abline(c(0,1), col = "red", lwd = 2, lty = 2)
Applies the whitening transformation to a data matrix based on the Cholesky decomposition of the empirical covariance matrix.
whitening(x, Scatter = NULL)
whitening(x, Scatter = NULL)
x |
vector or matrix of data with, say, |
Scatter |
covariance (or scatter) matrix ( |
Returns the whitened data matrix , where
with the empirical covariance matrix.
Kessy, A., Lewin, A., Strimmer, K. (2018). Optimal whitening and decorrelation. The American Statistician 72, 309-314.
x <- iris[,1:4] species <- iris[,5] pairs(x, col = species) # plot of Iris # whitened data z <- whitening(x) pairs(z, col = species) # plot of
x <- iris[,1:4] species <- iris[,5] pairs(x, col = species) # plot of Iris # whitened data z <- whitening(x) pairs(z, col = species) # plot of
Returns the Wilson-Hilferty transformation of random variables with Gamma distribution.
wilson.hilferty(x, shape, rate = 1)
wilson.hilferty(x, shape, rate = 1)
x |
a numeric vector containing Gamma distributed deviates. |
shape , rate
|
shape and rate parameters. Must be positive. |
Let be a random variable following a Gamma distribution with parameters
=
shape
and =
rate
with density
where ,
,
and consider the random variable
. Thus, the Wilson-Hilferty transformation
is approximately distributed as a standard normal distribution. This is useful, for instance, in the construction of QQ-plots.
Terrell, G.R. (2003). The Wilson-Hilferty transformation is locally saddlepoint. Biometrika 90, 445-453.
Wilson, E.B., and Hilferty, M.M. (1931). The distribution of chi-square. Proceedings of the National Academy of Sciences of the United States of America 17, 684-688.
x <- rgamma(n = 300, shape = 2, rate = 1) z <- wilson.hilferty(x, shape = 2, rate = 1) par(pty = "s") qqnorm(z, main = "Transformed Gamma QQ-plot") abline(c(0,1), col = "red", lwd = 2, lty = 2)
x <- rgamma(n = 300, shape = 2, rate = 1) z <- wilson.hilferty(x, shape = 2, rate = 1) par(pty = "s") qqnorm(z, main = "Transformed Gamma QQ-plot") abline(c(0,1), col = "red", lwd = 2, lty = 2)