Title: | Product Density Estimation for ICA using Tilted Gaussian Density Estimates |
---|---|
Description: | A direct and flexible method for estimating an ICA model. This approach estimates the densities for each component directly via a tilted Gaussian. The tilt functions are estimated via a GAM Poisson model. Details can be found in "Elements of Statistical Learning (2nd Edition)" in Section 14.7.4. |
Authors: | Trevor Hastie, Rob Tibshirani |
Maintainer: | Trevor Hastie <[email protected]> |
License: | GPL-2 |
Version: | 1.1 |
Built: | 2024-11-15 06:24:15 UTC |
Source: | CRAN |
The Amari distance is a measure between two nonsingular matrices. Useful for checking for convergence in ICA algorithms, and for comparing solutions.
amari(V, W, orth = FALSE)
amari(V, W, orth = FALSE)
V |
first matrix |
W |
second matrix |
orth |
are the matrices orthogonal; default is |
Formula is given in second reference below, page 570.
a numeric distance metween 0 and 1
Trevor Hastie
Bach, F. and Jordan, M. (2002). Kernel independent component analysis,
Journal of Machine Learning Research 3: 1-48
Hastie, T., Tibshirani, R. and Friedman, J. (2009) Elements of
Statistical Learning (2nd edition), Springer.
https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf
ProDenICA
dist="n" N=1024 p=2 A0<-mixmat(p) s<-scale(cbind(rjordan(dist,N),rjordan(dist,N))) x <- s %*% A0 ###Whiten the data x <- scale(x, TRUE, FALSE) sx <- svd(x) ### orthogonalization function x <- sqrt(N) * sx$u target <- solve(A0) target <- diag(sx$d) %*% t(sx$v) %*% target/sqrt(N) W0 <- matrix(rnorm(2*2), 2, 2) W0 <- ICAorthW(W0) W1 <- ProDenICA(x, W0=W0,trace=TRUE,Gfunc=G1)$W fit=ProDenICA(x, W0=W0,Gfunc=GPois,trace=TRUE, density=TRUE) W2 <- fit$W #distance of FastICA from target amari(W1,target) #distance of ProDenICA from target amari(W2,target)
dist="n" N=1024 p=2 A0<-mixmat(p) s<-scale(cbind(rjordan(dist,N),rjordan(dist,N))) x <- s %*% A0 ###Whiten the data x <- scale(x, TRUE, FALSE) sx <- svd(x) ### orthogonalization function x <- sqrt(N) * sx$u target <- solve(A0) target <- diag(sx$d) %*% t(sx$v) %*% target/sqrt(N) W0 <- matrix(rnorm(2*2), 2, 2) W0 <- ICAorthW(W0) W1 <- ProDenICA(x, W0=W0,trace=TRUE,Gfunc=G1)$W fit=ProDenICA(x, W0=W0,Gfunc=GPois,trace=TRUE, density=TRUE) W2 <- fit$W #distance of FastICA from target amari(W1,target) #distance of ProDenICA from target amari(W2,target)
contrast functions for computing the negentropy criteria used in FastICA; see references.
G1(s, a=1) G0(s, a=1)
G1(s, a=1) G0(s, a=1)
s |
estimated independent component |
a |
additional tuning parameter (only used in |
a list with components
Gs |
contrast function evaluated at values of x. |
gs |
estimated first derivative of |
gps |
estimated second derivative of |
Trevor Hastie and Rob Tibshirani
Hyvarinen, A., Karhunen, J. and Oja, E. (2001). Independent Component
Analysis, Wiley, New York
Hastie, T. and Tibshirani, R. (2003) Independent Component Analysis
through Product Density Estimation in Advances in Neural Information
Processing Systems 15 (Becker, S. and Obermayer, K., eds), MIT Press,
Cambridge, MA. pp 649-656
Hastie, T., Tibshirani, R. and Friedman, J. (2009) Elements of
Statistical Learning (2nd edition), Springer.
https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf
GPois
and ProDenICA
p=2 ### Can use letters a-r below for dist dist="n" N=1024 A0<-mixmat(p) s<-scale(cbind(rjordan(dist,N),rjordan(dist,N))) x <- s %*% A0 fit=ProDenICA(x,Gfunc=G1, whiten=TRUE)
p=2 ### Can use letters a-r below for dist dist="n" N=1024 A0<-mixmat(p) s<-scale(cbind(rjordan(dist,N),rjordan(dist,N))) x <- s %*% A0 fit=ProDenICA(x,Gfunc=G1, whiten=TRUE)
This is a contrast method for ProDenICA
. It fits a tilted
Gaussian density estimate by multiplying the Gaussian density by an
exponential tilt function using a cubic smoothing spline
GPois(x, df = 6, B = 500, order = 1, widen = 1.2, density.return = FALSE, ...)
GPois(x, df = 6, B = 500, order = 1, widen = 1.2, density.return = FALSE, ...)
x |
vector of real values |
df |
degrees of freedom for the smoothing-spline fit; default is 6 |
B |
number of grid points for density estimate; default is 500 |
order |
A robustness parameter to avoid responding to outliers in
|
widen |
an expansion factor to widen the range of |
density.return |
logical variable, with default |
... |
additional arguments to GAM; typically not used |
See Section 14.7.4 of 'Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2009, 2nd Edition)' for details
a list with components
Gs |
estimated contrast function, which is the log of the tilting
function, evaluated at the original values of |
gs |
estimated first derivative of |
gps |
estimated second derivative of |
density |
if |
Trevor Hastie and Rob Tibshirani
Hastie, T. and Tibshirani, R. (2003) Independent Component Analysis
through Product Density Estimation in Advances in Neural Information
Processing Systems 15 (Becker, S. and Obermayer, K., eds), MIT Press,
Cambridge, MA. pp 649-656
Hastie, T., Tibshirani, R. and Friedman, J. (2009) Elements of
Statistical Learning (2nd edition), Springer.
https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf
ProDenICA
, G1
and G0
p=2 ### Can use letters a-r below for dist dist="n" N=1024 A0<-mixmat(p) s<-scale(cbind(rjordan(dist,N),rjordan(dist,N))) x <- s %*% A0 fit=ProDenICA(x,Gfunc=GPois, whiten=TRUE, density=TRUE) par(mfrow=c(2,1)) plot(fit)
p=2 ### Can use letters a-r below for dist dist="n" N=1024 A0<-mixmat(p) s<-scale(cbind(rjordan(dist,N),rjordan(dist,N))) x <- s %*% A0 fit=ProDenICA(x,Gfunc=GPois, whiten=TRUE, density=TRUE) par(mfrow=c(2,1)) plot(fit)
use the SVD to orthogonalize a matrix
ICAorthW(W)
ICAorthW(W)
W |
input matrix |
simply replace the D matrix of the SVD of W by the identity
orthogonalized version of W
If W=UDV', then returns UV'
Trevor Hastie
W0 <- matrix(rnorm(2*2), 2, 2) W0 <- ICAorthW(W0)
W0 <- matrix(rnorm(2*2), 2, 2) W0 <- ICAorthW(W0)
A simple function for generating a 'well behaved' random square mixing matrix
mixmat(p = 2)
mixmat(p = 2)
p |
dimnesion of matrix |
Generates a random matrix by constructing its SVD. The singular values are drawn from a uniform on [1,2], hence guaranteeing a condition number between 1 and 2
a pxp matrix
Trevor Hastie
Fits an ICA model by directly estimating the densities of the independent components using Poisson GAMs. The densities have the form of tilted Gaussians, and hense directly estimate the contrast functions that lead to negentropy measures. This function supports Section 14.7.4 of 'Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2009, 2nd Edition)'. Models include 'FastICA'.
ProDenICA(x, k = p, W0 = NULL, whiten = FALSE, maxit = 20, thresh = 1e-07, restarts = 0, trace = FALSE, Gfunc = GPois, eps.rank = 1e-07, ...)
ProDenICA(x, k = p, W0 = NULL, whiten = FALSE, maxit = 20, thresh = 1e-07, restarts = 0, trace = FALSE, Gfunc = GPois, eps.rank = 1e-07, ...)
x |
input matrix |
k |
Number of components required, less than or equal to the number of columns of x |
W0 |
Optional initial matrix (for comparing algorithms) |
whiten |
Logical variable - should x be whitened. If TRUE, the SVD of X=UDV' is computed, and U is used (up to rank(X) columns). Also k is reduced to min(k,rank(X)). If FALSE (default), it is assumed that the user has pre-whitened x (and if not, the function may not perform properly) |
maxit |
Maximum number of iterations; default is 20 |
thresh |
Convergence threshold, in terms of relative change in Amari metric; dfault is 1e-7 |
restarts |
Number of random restarts; default is 0 |
trace |
Trace iterations; default is FALSE |
Gfunc |
Contrast functional which is basis for negentropy
measure. Default is |
eps.rank |
Threshold for deciding rank of x if option
|
... |
Additional arguments for |
See Section 14.7.4 of Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2009, 2nd Edition)
An object of S3 class "ProDenICA"
is returned, with the following components:
W |
Orthonormal matrix that takes the whitened version of x to the independent components |
negentropy |
The total negentropy measure of this solution |
s |
the matrix of k independent components |
whitner |
if |
call |
the call that produced this object |
density |
If |
Trevor Hastie and Rob Tibshirani
Hastie, T. and Tibshirani, R. (2003) Independent Component Analysis
through Product Density Estimation in Advances in Neural Information
Processing Systems 15 (Becker, S. and Obermayer, K., eds), MIT Press,
Cambridge, MA. pp 649-656
Hastie, T., Tibshirani, R. and Friedman, J. (2009) Elements of
Statistical Learning (2nd edition), Springer.
https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf
GPois
, G1
and plot
method.
p=2 ### Can use letters a-r below for dist dist="n" N=1024 A0<-mixmat(p) s<-scale(cbind(rjordan(dist,N),rjordan(dist,N))) x <- s %*% A0 ###Whiten the data x <- scale(x, TRUE, FALSE) sx <- svd(x) ### orthogonalization function x <- sqrt(N) * sx$u target <- solve(A0) target <- diag(sx$d) %*% t(sx$v) %*% target/sqrt(N) W0 <- matrix(rnorm(2*2), 2, 2) W0 <- ICAorthW(W0) W1 <- ProDenICA(x, W0=W0,trace=TRUE,Gfunc=G1)$W fit=ProDenICA(x, W0=W0,Gfunc=GPois,trace=TRUE, density=TRUE) W2 <- fit$W #distance of FastICA from target amari(W1,target) #distance of ProDenICA from target amari(W2,target) par(mfrow=c(2,1)) plot(fit)
p=2 ### Can use letters a-r below for dist dist="n" N=1024 A0<-mixmat(p) s<-scale(cbind(rjordan(dist,N),rjordan(dist,N))) x <- s %*% A0 ###Whiten the data x <- scale(x, TRUE, FALSE) sx <- svd(x) ### orthogonalization function x <- sqrt(N) * sx$u target <- solve(A0) target <- diag(sx$d) %*% t(sx$v) %*% target/sqrt(N) W0 <- matrix(rnorm(2*2), 2, 2) W0 <- ICAorthW(W0) W1 <- ProDenICA(x, W0=W0,trace=TRUE,Gfunc=G1)$W fit=ProDenICA(x, W0=W0,Gfunc=GPois,trace=TRUE, density=TRUE) W2 <- fit$W #distance of FastICA from target amari(W1,target) #distance of ProDenICA from target amari(W2,target) par(mfrow=c(2,1)) plot(fit)
Functions for generating the source densities used in Bach and Jordan (2002), and reused in Hastie and Tibshirani (2003)
rjordan(letter, n, ...) djordan(letter, x, ...)
rjordan(letter, n, ...) djordan(letter, x, ...)
letter |
one of the 18 letters |
n |
number of samples |
x |
ordinates at which to compute density |
... |
place filler for additional arguments |
This function produces the example densities used in Bach and Jordan (2002), and copied by Hastie and Tibshirani (2003). They include the 't', uniform, mixtures of exponentials and many mixtures of gaussian densities. Each are standardized to have mean zero and variance 1.
Either a vector of density values the length of x
for
djordan
, or a vector of n
draws for rjordan
Trevor Hastie
Bach, F. and Jordan, M. (2002). Kernel independent component analysis,
Journal of Machine Learning Research 3: 1-48
Hastie, T. and Tibshirani, R. (2003) Independent Component Analysis
through Product Density Estimation in Advances in Neural Information
Processing Systems 15 (Becker, S. and Obermayer, K., eds), MIT Press,
Cambridge, MA. pp 649-656
Hastie, T., Tibshirani, R. and Friedman, J. (2009) Elements of
Statistical Learning (2nd edition), Springer.
https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf
ProDenICA
dist="n" N=1024 s<-scale(cbind(rjordan(dist,N),rjordan(dist,N)))
dist="n" N=1024 s<-scale(cbind(rjordan(dist,N),rjordan(dist,N)))