Title: | Functions and C++ Header Files for Computation on Manifolds |
---|---|
Description: | We provide a number of algorithms to estimate fundamental statistics including Fréchet mean and geometric median for manifold-valued data. Also, C++ header files are contained that implement elementary operations on manifolds such as Sphere, Grassmann, and others. See Bhattacharya and Bhattacharya (2012) <doi:10.1017/CBO9781139094764> if you are interested in statistics on manifolds, and Absil et al (2007, ISBN:9780691132983) on computational aspects of optimization on matrix manifolds. |
Authors: | Kisung You [aut, cre] |
Maintainer: | Kisung You <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.5 |
Built: | 2024-11-11 06:52:29 UTC |
Source: | CRAN |
We provide a number of algorithms to estimate fundamental statistics including Fréchet mean and geometric median for manifold-valued data. Also, C++ header files are contained that implement elementary operations on manifolds such as Sphere, Grassmann, and others. See Bhattacharya and Bhattacharya (2012) <doi:10.1017/CBO9781139094764> if you are interested in statistics on manifolds, and Absil et al (2007) <isbn:978-0-691-13298-3> on computational aspects of optimization on matrix manifolds.
Suppose we have to two curves evaluated at finite locations
,
rbase.curvedist
computes distance between two curves and
using finite difference approximation with trapezoidal rule.
In order to induce no interpolation, two curves should be of same length.
rbase.curvedist(curve1, curve2, t = NULL, type = c("intrinsic", "extrinsic"))
rbase.curvedist(curve1, curve2, t = NULL, type = c("intrinsic", "extrinsic"))
curve1 |
a S3 object of |
curve2 |
a S3 object of |
t |
a length- |
type |
type of Riemannian distance ( |
computed distance.
## Not run: ### Generate two sets of 10 2-frames in R^4 : as grassmann points ndata = 10 data1 = array(0,c(4,2,ndata)) data2 = array(0,c(4,2,ndata)) for (i in 1:ndata){ tgt = matrix(rnorm(4*4),nrow=4) data1[,,i] = qr.Q(qr(tgt))[,1:2] } for (i in 1:ndata){ tgt = matrix(rnorm(4*5, sd=2),nrow=4) data2[,,i] = qr.Q(qr(tgt))[,1:2] } gdata1 = riemfactory(data1, name="grassmann") # wrap as 'riemdata' class. gdata2 = riemfactory(data2, name="grassmann") rbase.curvedist(gdata1, gdata2) ## End(Not run)
## Not run: ### Generate two sets of 10 2-frames in R^4 : as grassmann points ndata = 10 data1 = array(0,c(4,2,ndata)) data2 = array(0,c(4,2,ndata)) for (i in 1:ndata){ tgt = matrix(rnorm(4*4),nrow=4) data1[,,i] = qr.Q(qr(tgt))[,1:2] } for (i in 1:ndata){ tgt = matrix(rnorm(4*5, sd=2),nrow=4) data2[,,i] = qr.Q(qr(tgt))[,1:2] } gdata1 = riemfactory(data1, name="grassmann") # wrap as 'riemdata' class. gdata2 = riemfactory(data2, name="grassmann") rbase.curvedist(gdata1, gdata2) ## End(Not run)
For manifold-valued data, Fréchet mean is the solution of following cost function,
for a given data and
is the geodesic distance
between two points on manifold
. It uses a gradient descent method
with a backtracking search rule for updating.
rbase.mean(input, maxiter = 496, eps = 1e-06, parallel = FALSE)
rbase.mean(input, maxiter = 496, eps = 1e-06, parallel = FALSE)
input |
a S3 object of |
maxiter |
maximum number of iterations for gradient descent algorithm. |
eps |
stopping criterion for the norm of gradient. |
parallel |
a flag for enabling parallel computation. |
a named list containing
an estimate Fréchet mean.
number of iterations until convergence.
Kisung You
Karcher H (1977). “Riemannian center of mass and mollifier smoothing.” Communications on Pure and Applied Mathematics, 30(5), 509–541. ISSN 00103640, 10970312.
Kendall WS (1990). “Probability, Convexity, and Harmonic Maps with Small Image I: Uniqueness and Fine Existence.” Proceedings of the London Mathematical Society, s3-61(2), 371–406. ISSN 00246115.
Afsari B, Tron R, Vidal R (2013). “On the Convergence of Gradient Descent for Finding the Riemannian Center of Mass.” SIAM Journal on Control and Optimization, 51(3), 2230–2260. ISSN 0363-0129, 1095-7138.
### Generate 100 data points on Sphere S^2 near (0,0,1). ndata = 100 theta = seq(from=-0.99,to=0.99,length.out=ndata)*pi tmpx = cos(theta) + rnorm(ndata,sd=0.1) tmpy = sin(theta) + rnorm(ndata,sd=0.1) ### Wrap it as 'riemdata' class data = list() for (i in 1:ndata){ tgt = c(tmpx[i],tmpy[i],1) data[[i]] = tgt/sqrt(sum(tgt^2)) # project onto Sphere } data = riemfactory(data, name="sphere") ### Compute Fréchet Mean out1 = rbase.mean(data) out2 = rbase.mean(data,parallel=TRUE) # test parallel implementation
### Generate 100 data points on Sphere S^2 near (0,0,1). ndata = 100 theta = seq(from=-0.99,to=0.99,length.out=ndata)*pi tmpx = cos(theta) + rnorm(ndata,sd=0.1) tmpy = sin(theta) + rnorm(ndata,sd=0.1) ### Wrap it as 'riemdata' class data = list() for (i in 1:ndata){ tgt = c(tmpx[i],tmpy[i],1) data[[i]] = tgt/sqrt(sum(tgt^2)) # project onto Sphere } data = riemfactory(data, name="sphere") ### Compute Fréchet Mean out1 = rbase.mean(data) out2 = rbase.mean(data,parallel=TRUE) # test parallel implementation
For manifold-valued data, geometric median is the solution of following cost function,
for a given data ,
the geodesic distance
between two points on manifold
, and
a logarithmic mapping onto the
tangent space
. Weiszfeld's algorithms is employed.
rbase.median(input, maxiter = 496, eps = 1e-06, parallel = FALSE)
rbase.median(input, maxiter = 496, eps = 1e-06, parallel = FALSE)
input |
a S3 object of |
maxiter |
maximum number of iterations for gradient descent algorithm. |
eps |
stopping criterion for the norm of gradient. |
parallel |
a flag for enabling parallel computation. |
a named list containing
an estimate geometric median.
number of iterations until convergence.
Kisung You
Fletcher PT, Venkatasubramanian S, Joshi S (2009). “The geometric median on Riemannian manifolds with application to robust atlas estimation.” NeuroImage, 45(1), S143–S152. ISSN 10538119.
Aftab K, Hartley R, Trumpf J (2015). “Generalized Weiszfeld Algorithms for Lq Optimization.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(4), 728–745. ISSN 0162-8828, 2160-9292.
### Generate 100 data points on Sphere S^2 near (0,0,1). ndata = 100 theta = seq(from=-0.99,to=0.99,length.out=ndata)*pi tmpx = cos(theta) + rnorm(ndata,sd=0.1) tmpy = sin(theta) + rnorm(ndata,sd=0.1) ### Wrap it as 'riemdata' class data = list() for (i in 1:ndata){ tgt = c(tmpx[i],tmpy[i],1) data[[i]] = tgt/sqrt(sum(tgt^2)) # project onto Sphere } data = riemfactory(data, name="sphere") ### Compute Geodesic Median out1 = rbase.median(data) out2 = rbase.median(data,parallel=TRUE) # test parallel implementation
### Generate 100 data points on Sphere S^2 near (0,0,1). ndata = 100 theta = seq(from=-0.99,to=0.99,length.out=ndata)*pi tmpx = cos(theta) + rnorm(ndata,sd=0.1) tmpy = sin(theta) + rnorm(ndata,sd=0.1) ### Wrap it as 'riemdata' class data = list() for (i in 1:ndata){ tgt = c(tmpx[i],tmpy[i],1) data[[i]] = tgt/sqrt(sum(tgt^2)) # project onto Sphere } data = riemfactory(data, name="sphere") ### Compute Geodesic Median out1 = rbase.median(data) out2 = rbase.median(data,parallel=TRUE) # test parallel implementation
Geodesic distance is the length of (locally) shortest path
connecting two points
. Some manifolds have closed-form expression, while
others need numerical approximation.
rbase.pdist(input, parallel = FALSE)
rbase.pdist(input, parallel = FALSE)
input |
a S3 object of |
parallel |
a flag for enabling parallel computation. |
an matrix of pairwise distances.
### Generate 10 2-frames in R^4 ndata = 10 data = array(0,c(4,2,ndata)) for (i in 1:ndata){ tgt = matrix(rnorm(4*4),nrow=4) data[,,i] = qr.Q(qr(tgt))[,1:2] } ## Compute Pairwise Distances as if for Grassmann and Stiefel Manifold A = rbase.pdist(riemfactory(data,name="grassmann")) B = rbase.pdist(riemfactory(data,name="stiefel")) ## Visual Comparison in Two Cases opar = par(no.readonly=TRUE) par(mfrow=c(1,2)) image(A, col=gray((0:100)/100), main="Grassmann") image(B, col=gray((0:100)/100), main="Stiefel") par(opar)
### Generate 10 2-frames in R^4 ndata = 10 data = array(0,c(4,2,ndata)) for (i in 1:ndata){ tgt = matrix(rnorm(4*4),nrow=4) data[,,i] = qr.Q(qr(tgt))[,1:2] } ## Compute Pairwise Distances as if for Grassmann and Stiefel Manifold A = rbase.pdist(riemfactory(data,name="grassmann")) B = rbase.pdist(riemfactory(data,name="stiefel")) ## Visual Comparison in Two Cases opar = par(no.readonly=TRUE) par(mfrow=c(1,2)) image(A, col=gray((0:100)/100), main="Grassmann") image(B, col=gray((0:100)/100), main="Stiefel") par(opar)
Unlike rbase.pdist
, rbase.pdist2
takes two sets of data and
and compute
number of pairwise distances for all
and
.
rbase.pdist2(input1, input2, parallel = FALSE)
rbase.pdist2(input1, input2, parallel = FALSE)
input1 |
a S3 object of |
input2 |
a S3 object of |
parallel |
a flag for enabling parallel computation. |
an matrix of pairwise distances.
### Generate 10 2-frames in R^4 : as grassmann points ndata = 10 data = array(0,c(4,2,ndata)) for (i in 1:ndata){ tgt = matrix(rnorm(4*4),nrow=4) data[,,i] = qr.Q(qr(tgt))[,1:2] } gdata = riemfactory(data, name="grassmann") ## Compute Pairwise Distances using pdist and pdist2 A = rbase.pdist(gdata) B = rbase.pdist2(gdata,gdata) ## Visual Comparison in Two Cases opar = par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(A, col=gray((0:100)/100), main="pdist") image(B, col=gray((0:100)/100), main="pdist2") par(opar)
### Generate 10 2-frames in R^4 : as grassmann points ndata = 10 data = array(0,c(4,2,ndata)) for (i in 1:ndata){ tgt = matrix(rnorm(4*4),nrow=4) data[,,i] = qr.Q(qr(tgt))[,1:2] } gdata = riemfactory(data, name="grassmann") ## Compute Pairwise Distances using pdist and pdist2 A = rbase.pdist(gdata) B = rbase.pdist2(gdata,gdata) ## Visual Comparison in Two Cases opar = par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(A, col=gray((0:100)/100), main="pdist") image(B, col=gray((0:100)/100), main="pdist2") par(opar)
Robust estimator for mean starts from dividing the data into
equally sized
sets. For each subset, it first estimates Fréchet mean. It then follows a step to aggregate
sample means by finding a geometric median.
rbase.robust(input, k = 5, maxiter = 496, eps = 1e-06, parallel = FALSE)
rbase.robust(input, k = 5, maxiter = 496, eps = 1e-06, parallel = FALSE)
input |
a S3 object of |
k |
number of subsets for which the data be divided into. |
maxiter |
maximum number of iterations for gradient descent algorithm and Weiszfeld algorithm. |
eps |
stopping criterion for the norm of gradient. |
parallel |
a flag for enabling parallel computation. |
a named list containing
an estimate geometric median.
number of iterations until convergence.
Kisung You
Lerasle M, Oliveira R~I (2011). “Robust empirical mean Estimators.” ArXiv e-prints. 1112.3914.
Minsker S (2013). “Geometric median and robust estimation in Banach spaces.” ArXiv e-prints. 1308.1334.
Feng J, Xu H, Mannor S (2014). “Distributed Robust Learning.” ArXiv e-prints. 1409.5937.
### Generate 100 data points on Sphere S^2 near (0,0,1). ndata = 100 theta = seq(from=-0.99,to=0.99,length.out=ndata)*pi tmpx = cos(theta) + rnorm(ndata,sd=0.1) tmpy = sin(theta) + rnorm(ndata,sd=0.1) ### Wrap it as 'riemdata' class data = list() for (i in 1:ndata){ tgt = c(tmpx[i],tmpy[i],1) data[[i]] = tgt/sqrt(sum(tgt^2)) # project onto Sphere } data = riemfactory(data, name="sphere") ### Compute Robust Fréchet Mean out1 = rbase.robust(data) out2 = rbase.robust(data,parallel=TRUE) # test parallel implementation
### Generate 100 data points on Sphere S^2 near (0,0,1). ndata = 100 theta = seq(from=-0.99,to=0.99,length.out=ndata)*pi tmpx = cos(theta) + rnorm(ndata,sd=0.1) tmpy = sin(theta) + rnorm(ndata,sd=0.1) ### Wrap it as 'riemdata' class data = list() for (i in 1:ndata){ tgt = c(tmpx[i],tmpy[i],1) data[[i]] = tgt/sqrt(sum(tgt^2)) # project onto Sphere } data = riemfactory(data, name="sphere") ### Compute Robust Fréchet Mean out1 = rbase.robust(data) out2 = rbase.robust(data,parallel=TRUE) # test parallel implementation
Most of the functions for RiemBase
package require data to be wrapped as a riemdata
class.
Since manifolds of interests endow data points with specific constraints, the function riemfactory
first checks the requirements to characterize the manifold and then wraps the data into
riemdata
class, which is simply a list of manifold-valued data and the name of manifold.
Manifold name input is, fortunately, case-insensitive.
riemfactory( data, name = c("euclidean", "grassmann", "spd", "sphere", "stiefel") )
riemfactory( data, name = c("euclidean", "grassmann", "spd", "sphere", "stiefel") )
data |
data to be wrapped as
|
name |
the name of Riemmanian manifold for data to which data belong. |
a named riemdata
S3 object containing
a list of manifold-valued data points.
size of each data matrix.
name of the manifold of interests.
# Test with Sphere S^2 in R^3 example ## Prepare a matrix and list of 20 samples on S^2 sp.mat = array(0,c(3,20)) # each vector will be recorded as a column sp.list = list() for (i in 1:20){ tgt = rnorm(3) # sample random numbers tgt = tgt/sqrt(sum(tgt*tgt)) # normalize sp.mat[,i] = tgt # record it as column vector sp.list[[i]] = tgt # record it as an element in a list } ## wrap it using 'riemfactory' rsp1 = riemfactory(sp.mat, name="Sphere") rsp2 = riemfactory(sp.list, name="spHeRe")
# Test with Sphere S^2 in R^3 example ## Prepare a matrix and list of 20 samples on S^2 sp.mat = array(0,c(3,20)) # each vector will be recorded as a column sp.list = list() for (i in 1:20){ tgt = rnorm(3) # sample random numbers tgt = tgt/sqrt(sum(tgt*tgt)) # normalize sp.mat[,i] = tgt # record it as column vector sp.list[[i]] = tgt # record it as an element in a list } ## wrap it using 'riemfactory' rsp1 = riemfactory(sp.mat, name="Sphere") rsp2 = riemfactory(sp.list, name="spHeRe")