Title: | Seeded Canonical Correlation Analysis |
---|---|
Description: | Functions for dimension reduction through the seeded canonical correlation analysis are provided. A classical canonical correlation analysis (CCA) is one of useful statistical methods in multivariate data analysis, but it is limited in use due to the matrix inversion for large p small n data. To overcome this, a seeded CCA has been proposed in Im, Gang and Yoo (2015) \doi{10.1002/cem.2691}. The seeded CCA is a two-step procedure. The sets of variables are initially reduced by successively projecting cov(X,Y) or cov(Y,X) onto cov(X) and cov(Y), respectively, without loss of information on canonical correlation analysis, following Cook, Li and Chiaromonte (2007) \doi{10.1093/biomet/asm038} and Lee and Yoo (2014) \doi{10.1111/anzs.12057}. Then, the canonical correlation is finalized with the initially-reduced two sets of variables. |
Authors: | Jae Keun Yoo, Bo-Young Kim |
Maintainer: | Jae Keun Yoo <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 3.1 |
Built: | 2024-12-01 08:48:43 UTC |
Source: | CRAN |
Returns coefficients of partial least squares through iterative projections. It works only for subclasses "seedols" and seedpls".
## S3 method for class 'seedCCA' coef(object, u=NULL,...)
## S3 method for class 'seedCCA' coef(object, u=NULL,...)
object |
The name of an object of class "seedCCA" |
u |
numeric, the number of projections. The default is NULL. This option is valid for PLS alone. The option returns the coefficient estimates for u projections. For example, if it is specified at k, then the coefficient estimates with k projections are returned. |
... |
arguments passed to the coef.method |
######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) fit.ols1 <- seedCCA(X[,1:4], Y[,1], type="cca") fit.pls1 <- seedCCA(X,Y[,1],type="pls") coef(fit.ols1) coef(fit.pls1) coef(fit.pls1, u=4)
######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) fit.ols1 <- seedCCA(X[,1:4], Y[,1], type="cca") fit.pls1 <- seedCCA(X,Y[,1],type="pls") coef(fit.ols1) coef(fit.pls1) coef(fit.pls1, u=4)
This data set contains measurements from quantitative NIR spectroscopy. The example studied arises from an experiment done to test the feasibility of NIR spectroscopy to measure the composition of biscuit dough pieces (formed but unbaked biscuits). Two similar sample sets were made up, with the standard recipe varied to provide a large range for each of the four constituents under investigation: fat, sucrose, dry flour, and water. The calculated percentages of these four ingredients represent the 4 responses. There are 40 samples in the calibration or training set (with sample 23 being an outlier) and a further 32 samples in the separate prediction or validation set (with example 21 considered as an outlier).
An NIR reflectance spectrum is available for each dough piece. The spectral data consist of 700 points measured from 1100 to 2498 nanometers (nm) in steps of 2 nm.
data(cookie)
data(cookie)
A data frame of dimension 72 x 704. The first 700 columns correspond to the NIR reflectance spectrum, the last four columns correspond to the four constituents fat, sucrose, dry flour, and water. The first 40 rows correspond to the calibration data, the last 32 rows correspond to the prediction data.
Please cite the following papers if you use this data set.
P.J. Brown, T. Fearn, and M. Vannucci (2001) Bayesian Wavelet Regression on Curves with Applications to a Spectroscopic Calibration Problem. Journal of the American Statistical Association, 96, pp. 398-408.
B.G. Osborne, T. Fearn, A.R. Miller, and S. Douglas (1984) Application of Near-Infrared Reflectance Spectroscopy to Compositional Analysis of Biscuits and Biscuit Dough. Journal of the Science of Food and Agriculture, 35, pp. 99 - 105.
data(cookie) # load data X<-as.matrix(cookie[,1:700]) # extract NIR spectra Y<-as.matrix(cookie[,701:704]) # extract constituents Xtrain<-X[1:40,] # extract training data Ytrain<-Y[1:40,] # extract training data Xtest<-X[41:72,] # extract test data Ytest<-Y[41:72,] # extract test data
data(cookie) # load data X<-as.matrix(cookie[,1:700]) # extract NIR spectra Y<-as.matrix(cookie[,701:704]) # extract constituents Xtrain<-X[1:40,] # extract training data Ytrain<-Y[1:40,] # extract training data Xtest<-X[41:72,] # extract test data Ytest<-Y[41:72,] # extract test data
Returns a scree-plot of the eigenvalues of cov(first.set, second.set) to select its first d largest eigenvectors.
covplot(X, Y, mind=NULL)
covplot(X, Y, mind=NULL)
X |
numeric matrix (n * p), X |
Y |
numeric matrix (n * r), Y |
mind |
numeric, the number of the eigenvalues to show their cumulative percentages. The default is NULL, and then it is equal to min(p,r) |
eigenvalues |
the ordiered eigenvalues of cov(X,Y) |
cum.percent |
the cumulative percentages of the eigenvalues |
num.evecs |
a vector of the numbers of the eigenvectors which forces the cumulative percentages bigger than 0.6, 0.7, 0.8, 0.9 |
data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) covplot(X, Y) covplot(X, Y, mind=4)
data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) covplot(X, Y) covplot(X, Y, mind=4)
Returns the results of the finalized step in seeded CCA.
finalCCA(X, Y)
finalCCA(X, Y)
X |
numeric matrix (n * d), the initially-CCAed first set of variables |
Y |
numeric matrix (n * d), the initially-CCAed second set of variables |
cor |
canonical correlations in finalized step |
xcoef |
the estimated canonical coefficient matrix of the initially-CCAed first set of variables |
ycoef |
the estimated canonical coefficient matrix of the initially-CCAed second set of variables |
Xscores |
the finalized canonical variates of the first set of variables |
Yscores |
the finalized canonical variates of the second set of variables |
######## data(cookie) ######## data(cookie) myseq <- seq(141, 651, by=2) X <- as.matrix(cookie[-c(23,61), myseq]) Y <- as.matrix(cookie[-c(23,61), 701:704]) min.pr <- min( dim(X)[2], dim(Y)[2]) MX0 <- iniCCA(X, Y, u=4, num.d=min.pr) ini.X <- X %*% MX0 finalCCA(ini.X, Y) ######## data(nutrimouse) ######## data(nutrimouse) Y<-as.matrix(nutrimouse$lipid) X<-as.matrix(nutrimouse$gene) MX0 <- iniCCA(X, Y, u=4, num.d=4) MY0 <- iniCCA(Y, X, u=5, num.d=4) ini.X <- X %*% MX0 ini.Y <- Y %*% MY0 finalCCA(ini.X, ini.Y)
######## data(cookie) ######## data(cookie) myseq <- seq(141, 651, by=2) X <- as.matrix(cookie[-c(23,61), myseq]) Y <- as.matrix(cookie[-c(23,61), 701:704]) min.pr <- min( dim(X)[2], dim(Y)[2]) MX0 <- iniCCA(X, Y, u=4, num.d=min.pr) ini.X <- X %*% MX0 finalCCA(ini.X, Y) ######## data(nutrimouse) ######## data(nutrimouse) Y<-as.matrix(nutrimouse$lipid) X<-as.matrix(nutrimouse$gene) MX0 <- iniCCA(X, Y, u=4, num.d=4) MY0 <- iniCCA(Y, X, u=5, num.d=4) ini.X <- X %*% MX0 ini.Y <- Y %*% MY0 finalCCA(ini.X, ini.Y)
Returns fitted values of ordinary and partial least squares through iterative projections. It works only for subclasses "seedols" and "seedpls".
## S3 method for class 'seedCCA' fitted(object, u=NULL,...)
## S3 method for class 'seedCCA' fitted(object, u=NULL,...)
object |
The name of an object of class "seedCCA" |
u |
numeric, the number of projections. The default is NULL. This option is valid for PLS alone. The option returns the fitted values for u projections. For example, if it is specified at k, then the fitted values with k projections are returned. |
... |
arguments passed to the fitted.method |
######## data(cookie) ######## ######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) fit.ols1 <- seedCCA(X[,1:4], Y[,1], type="cca") fit.pls1 <- seedCCA(X, Y[,1], type="pls") fit.pls2 <- seedCCA(X, Y[,1], type="pls", scale=TRUE) fitted(fit.ols1) fitted(fit.pls1) fitted(fit.pls1, u=4) fitted(fit.pls2, u=4)
######## data(cookie) ######## ######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) fit.ols1 <- seedCCA(X[,1:4], Y[,1], type="cca") fit.pls1 <- seedCCA(X, Y[,1], type="pls") fit.pls2 <- seedCCA(X, Y[,1], type="pls", scale=TRUE) fitted(fit.ols1) fitted(fit.pls1) fitted(fit.pls1, u=4) fitted(fit.pls2, u=4)
Returns the canonical coefficient matrices from the initialized step in seeded CCA. The initialzied CCA is done only for the first set in its first argument. The num.d
must be less than or equal to the dimension of the second set.
iniCCA(X, Y, u, num.d)
iniCCA(X, Y, u, num.d)
X |
numeric matrix (n * p), the first set of variables: this set of variables alone is reduced. |
Y |
numeric matrix (n * r), the second set of variables |
u |
numeric, the terminiating index of the projection |
num.d |
numeric, the first "num.d" eigenvectors of cov(X,Y) to replace cov(X,Y), if min(p,r) relatively bigger than n. The |
B |
the initialized CCAed coefficient matrix projected by the value of |
######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) min.pr <- min( dim(X)[2], dim(Y)[2]) MX0 <- iniCCA(X, Y, u=4, num.d=min.pr) ini.X <- X%*%MX0 ######## data(nutrimouse) ######## data(nutrimouse) Y<-as.matrix(nutrimouse$lipid) X<-as.matrix(nutrimouse$gene) MX0 <- iniCCA(X, Y, u=4, num.d=4) MY0 <- iniCCA(Y, X, u=5, num.d=4) ini.X <- X %*% MX0 ini.Y <- Y %*% MY0
######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) min.pr <- min( dim(X)[2], dim(Y)[2]) MX0 <- iniCCA(X, Y, u=4, num.d=min.pr) ini.X <- X%*%MX0 ######## data(nutrimouse) ######## data(nutrimouse) Y<-as.matrix(nutrimouse$lipid) X<-as.matrix(nutrimouse$gene) MX0 <- iniCCA(X, Y, u=4, num.d=4) MY0 <- iniCCA(Y, X, u=5, num.d=4) ini.X <- X %*% MX0 ini.Y <- Y %*% MY0
The nutrimouse dataset comes from a nutrition study in the mouse. It was provided by Pascal Martin from the Toxicology and Pharmacology Laboratory (French National Institute for Agronomic Research).
data(nutrimouse)
data(nutrimouse)
gene: data frame (40 * 120) with numerical variables
lipid: data frame (40 * 21) with numerical variables
diet: factor vector (40)
genotype: factor vector (40)
Two sets of variables were measured on 40 mice:
expressions of 120 genes potentially involved in nutritional problems.
concentrations of 21 hepatic fatty acids: The 40 mice were distributed in a 2-factors experimental design (4 replicates).
Genotype (2-levels factor): wild-type and PPARalpha -/-
Diet (5-levels factor): Oils used for experimental diets preparation were corn and colza oils (50/50) for a reference diet (REF), hydrogenated coconut oil for a saturated fatty acid diet (COC), sunflower oil for an Omega6 fatty acid-rich diet (SUN), linseed oil for an Omega3-rich diet (LIN) and corn/colza/enriched fish oils for the FISH diet (43/43/14).
P. Martin, H. Guillou, F. Lasserre, S. D??jean, A. Lan, J-M. Pascussi, M. San Cristobal, P. Legrand, P. Besse, T. Pineau (2007) Novel aspects of PPARalpha-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology, 45, 767???777
data(nutrimouse) boxplot(nutrimouse$lipid)
data(nutrimouse) boxplot(nutrimouse$lipid)
type
The function is for plotting class "seedCCA". Depending on subclass defined by the value of type
, its resulting plot is different.
## S3 method for class 'seedCCA' plot(x, ref=90, eps=0.01, ...)
## S3 method for class 'seedCCA' plot(x, ref=90, eps=0.01, ...)
x |
The name of an object of class "seedCCA" |
ref |
numeric, the value for reference line. It must be chosen between 0 and 100. It works only for subclass "finalCCA". |
eps |
numeric, a value to terminate projections. It must be chosen between 0 and 1. The default is equal to 0.01. It works only for subclass "seedpls". |
... |
arguments passed to the plot.method |
subclass "finalCCA": the function makes a plot for percents of cumulative canonical correlations.
subclass "seedpls": the function returns a proper number of projections and plot of the projection increment against the number of projections. A proper number of projections is indicated with a blue-color vertical bar in the plot. Only for subclass "seedpls", the output is retured. See Value part.
subclass "seedols": No plotting
subclass "selectu": the function makes a plot for increment of iterative projections by the output of subclass "selectu".
proper.u |
proper value of the number of projections |
nFu |
incrments (n*Fu) of the iterative projection. |
u |
the maximum number of projections from "seedpls" object |
eps |
a value for terminating the projection. The default value is equal to 0.01. |
######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) fit.cca <- seedCCA(X[,1:4],Y[,1:4],type="cca") fit.seed1 <- seedCCA(X,Y, type="seed1") fit.pls1 <- seedCCA(X,Y[,1],type="pls") fit.selu <- selectu(X,Y, type="seed2") plot(fit.cca) plot(fit.seed1, ref=95) plot(fit.pls1) plot(fit.pls1, eps=0.00001) plot(fit.selu)
######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) fit.cca <- seedCCA(X[,1:4],Y[,1:4],type="cca") fit.seed1 <- seedCCA(X,Y, type="seed1") fit.pls1 <- seedCCA(X,Y[,1],type="pls") fit.selu <- selectu(X,Y, type="seed2") plot(fit.cca) plot(fit.seed1, ref=95) plot(fit.pls1) plot(fit.pls1, eps=0.00001) plot(fit.selu)
Sx
inner-productThe function reuturns a projection of a seed matrix on to the column subspace of M with respect to Sx
inner-product.
Pm(M, Sx, seed)
Pm(M, Sx, seed)
M |
numeric matrix (p * k), a basis matrix of the column space of M |
Sx |
a inner-product matrix |
seed |
seed matrix |
data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) ## using cov(X,Y) as a seed matrix seed <- cov(X,Y) col.num <- dim(seed)[2] M <- iniCCA(X, Y, u=4, num.d=col.num) Sx <- cov(X) Pm(M, Sx, seed) ## using the first 2 largest eigenvectors of cov(X,Y) as a seed matrix seed2 <- svd(cov(X,Y))$u[,1:2] M2 <- iniCCA(X, Y, u=4, num.d=2) Pm(M, Sx, seed2)
data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) ## using cov(X,Y) as a seed matrix seed <- cov(X,Y) col.num <- dim(seed)[2] M <- iniCCA(X, Y, u=4, num.d=col.num) Sx <- cov(X) Pm(M, Sx, seed) ## using the first 2 largest eigenvectors of cov(X,Y) as a seed matrix seed2 <- svd(cov(X,Y))$u[,1:2] M2 <- iniCCA(X, Y, u=4, num.d=2) Pm(M, Sx, seed2)
The function controls to print class "seedCCA".
The function prints the estimated coefficents, if they exist.
For subclass "finalCCA", canonical correlations are additionally reported.
For subsclass "selectu", increments, suggested number of projections and the values of type
and eps
are reported.
## S3 method for class 'seedCCA' print(x,...)
## S3 method for class 'seedCCA' print(x,...)
x |
The name of an object of class "seedCCA" |
... |
arguments passed to the print.method |
######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) fit.seed2 <- seedCCA(X,Y) fit.seed2 print(fit.seed2)
######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) fit.seed2 <- seedCCA(X,Y) fit.seed2 print(fit.seed2)
The function seedCCA
is mainly for implementing seeded canonical correlation analysis proposed by Im et al. (2015). The function conducts the following four methods, depending on the value of type
. The option type
has one of c("cca", "seed1", "seed2", "pls")
.
seedCCA(X,Y,type="seed2",ux=NULL,uy=NULL,u=10,eps=0.01,cut=0.9,d=NULL,AS=TRUE,scale=FALSE)
seedCCA(X,Y,type="seed2",ux=NULL,uy=NULL,u=10,eps=0.01,cut=0.9,d=NULL,AS=TRUE,scale=FALSE)
X |
numeric vector or matrix (n * p), the first set of variables |
Y |
numeric vector or matrix (n * r), the second set of variables |
type |
character, a choice of methods among |
ux |
numeric, maximum number of projections for X. The default is NULL. If this is not NULL, it surpasses the option |
uy |
numeric, maximum number of projections for Y. The default is NULL. If this is not NULL, it surpasses the option |
u |
numeric, maximum number of projections. The default is 10. This is used for |
eps |
numeric, the criteria to terminate iterative projections. The default is 0.01. If increment of projections is less than |
cut |
numeric, between 0 and 1. The default is 0.9.
If |
d |
numeric, the user-selected number of largest eigenvectors of cov(X, Y) and cov(Y, X). The default is NULL. This only works for |
AS |
logical, status of automatic stop of projections. The default is |
scale |
logical. scaling predictors to have zero mean and one standard deviation. The default is |
Let p and r stand for the numbers of variables in the two sets and n stands for the sample size. The option of type="cca"
can work only when max(p,r) < n, and seedCCA
conducts standard canonical correlation analysis (Johnson and Wichern, 2007). If type="cca"
is given and either p or r is equal to one, ordinary least squares (OLS) is done instead of canonical correlation analysis. If max(p,r) >= n, either type="seed1"
or type="seed2"
has to be chosen. This is the main purpose of seedCCA
. If type="seed1"
, only one set of variables, saying X with p for convenience, to have more variables than the other, saying Y with r, is initially reduced by the iterative projection approach (Cook et al. 2007). And then, the canonical correlation analysis of the initially-reduced X and the original Y is finalized. If type="seed2"
, both X and Y are initially reduced. And then, the canonical correlation analysis of the two initially-reduced X and Y are finalzed. If type="pls"
, partial least squares (PLS) is done. If type="pls"
is given, the first set of variables in seedCCA
is predictors and the second set is response. This matters The response can be multivariate. Depeding on the value of type
, the resulted subclass by seedCCA
are different.:
type="cca"
: subclass "finalCCA" (p >2; r >2; p,r<n)
type="cca"
: subclass "seedols" (either p or r is equal to 1.)
type="seed1"
and type="seed2"
: subclass "finalCCA" (max(p,r)>n)
type="pls"
: subclass "seedpls" (p>n and r <n)
So, plot(object)
will result in different figure depending on the object.
The order of the values depending on type is follows.:
type="cca"
: standard CCA (max(p,r)<n, min(p,r)>1) / "finalCCA" subclass
type="cca"
: ordinary least squares (max(p,r)<n, min(p,r)=1) / "seedols" subclass
type="seed1"
: seeded CCA with case1 (max(p,r)>n and p>r) / "finalCCA" subclass
type="seed1"
: seeded CCA with case1 (max(p,r)>n and p<=r) / "finalCCA" subclass
type="seed2"
: seeded CCA with case2 (max(p,r)>n) / "finalCCA" subclass
type="pls"
: partial least squares (p>n and r<n) / "seedpls" subclass
type="cca" |
Values with selecting |
cor |
canonical correlations |
xcoef |
the estimated canonical coefficients for X |
ycoef |
the estimated canonical coefficients for Y |
Xscores |
the estimated canonical variates for X |
Yscores |
the estimated canonical variates for Y |
type="cca" |
Values with selecting |
coef |
the estimated ordinary least squares coefficients |
X |
X, the first set |
Y |
Y, the second set |
type="seed1" |
Values with selecting |
cor |
canonical correlations |
xcoef |
the estimated canonical coefficients for X |
ycoef |
the estimated canonical coefficients for Y |
proper.u |
a suggested proper number of projections for X |
initialMX0 |
the initialized canonical coefficient matrices of X |
newX |
initially-reduced X |
Y |
the original Y |
Xscores |
the estimated canonical variates for X |
Yscores |
the estimated canonical variates for Y |
type="seed1" |
Values with selecting |
cor |
canonical correlations |
xcoef |
the estimated canonical coefficients for X |
ycoef |
the estimated canonical coefficients for Y |
proper.u |
a suggested proper number of projections for Y |
X |
the original X |
initialMY0 |
the initialized canonical coefficient matrices of Y |
newY |
initially-reduced Y |
Xscores |
the estimated canonical variates for X |
Yscores |
the estimated canonical variates for Y |
type="seed2" |
Values with selecting |
cor |
canonical correlations |
xcoef |
the estimated canonical coefficients for X |
ycoef |
the estimated canonical coefficients for Y |
proper.ux |
a suggested proper number of projections for X |
proper.uy |
a suggested proper number of projections for Y |
d |
suggested number of eigenvectors of cov(X,Y) |
initialMX0 |
the initialized canonical coefficient matrices of X |
initialMY0 |
the initialized canonical coefficient matrices of Y |
newX |
initially-reduced X |
newY |
initially-reduced Y |
Xscores |
the estimated canonical variates for X |
Yscores |
the estimated canonical variates for Y |
type="pls" |
Values with selecting |
coef |
the estimated coefficients for each iterative projection upto u |
u |
the maximum number of projections |
X |
predictors |
Y |
response |
scale |
status of scaling predictors |
cases |
the number of observations |
R. D. Cook, B. Li and F. Chiaromonte. Dimension reduction in regression without matrix inversion. Biometrika 2007; 94: 569-584.
Y. Im, H. Gang and JK. Yoo. High-throughput data dimension reduction via seeded canonical correlation analysis, J. Chemometrics 2015; 29: 193-199.
R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analysis. Pearson Prentice Hall: New Jersey, USA; 6 edition. 2007; 539-574.
K. Lee and JK. Yoo. Canonical correlation analysis through linear modeling, AUST. NZ. J. STAT. 2014; 56: 59-72.
###### data(cookie) ###### data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) dim(X);dim(Y) ## standard CCA fit.cca <-seedCCA(X[,1:4], Y, type="cca") ## standard canonical correlation analysis is done. plot(fit.cca) ## ordinary least squares fit.ols1 <-seedCCA(X[,1:4], Y[,1], type="cca") ## ordinary least squares is done, because r=1. fit.ols2 <-seedCCA(Y[,1], X[,1:4], type="cca") ## ordinary least squares is done, because p=1. ## seeded CCA with case 1 fit.seed1 <- seedCCA(X, Y, type="seed1") ## suggested proper value of u is equal to 3. fit.seed1.ux <- seedCCA(X, Y, ux=6, type="seed1") ## iterative projections done 6 times. fit.seed1.uy <- seedCCA(Y, X, uy=6, type="seed1", AS=FALSE) ## projections not done until uy=6. plot(fit.seed1) ## partial least squares fit.pls1 <- seedCCA(X, Y[,1], type="pls") fit.pls.m <- seedCCA(X, Y, type="pls") ## multi-dimensional response par(mfrow=c(1,2)) plot(fit.pls1); plot(fit.pls.m) ######## data(nutrimouse) ######## data(nutrimouse) X<-as.matrix(nutrimouse$gene) Y<-as.matrix(nutrimouse$lipid) dim(X);dim(Y) ## seeded CCA with case 2 fit.seed2 <- seedCCA(X, Y, type="seed2") ## d not specified, so cut=0.9 (default) used. fit.seed2.99 <- seedCCA(X, Y, type="seed2", cut=0.99) ## cut=0.99 used. fit.seed2.d3 <- seedCCA(X, Y, type="seed2", d=3) ## d is specified with 3. ## ux and uy specified, so proper values not suggested. fit.seed2.uxuy <- seedCCA(X, Y, type="seed2", ux=10, uy=10) plot(fit.seed2)
###### data(cookie) ###### data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) dim(X);dim(Y) ## standard CCA fit.cca <-seedCCA(X[,1:4], Y, type="cca") ## standard canonical correlation analysis is done. plot(fit.cca) ## ordinary least squares fit.ols1 <-seedCCA(X[,1:4], Y[,1], type="cca") ## ordinary least squares is done, because r=1. fit.ols2 <-seedCCA(Y[,1], X[,1:4], type="cca") ## ordinary least squares is done, because p=1. ## seeded CCA with case 1 fit.seed1 <- seedCCA(X, Y, type="seed1") ## suggested proper value of u is equal to 3. fit.seed1.ux <- seedCCA(X, Y, ux=6, type="seed1") ## iterative projections done 6 times. fit.seed1.uy <- seedCCA(Y, X, uy=6, type="seed1", AS=FALSE) ## projections not done until uy=6. plot(fit.seed1) ## partial least squares fit.pls1 <- seedCCA(X, Y[,1], type="pls") fit.pls.m <- seedCCA(X, Y, type="pls") ## multi-dimensional response par(mfrow=c(1,2)) plot(fit.pls1); plot(fit.pls.m) ######## data(nutrimouse) ######## data(nutrimouse) X<-as.matrix(nutrimouse$gene) Y<-as.matrix(nutrimouse$lipid) dim(X);dim(Y) ## seeded CCA with case 2 fit.seed2 <- seedCCA(X, Y, type="seed2") ## d not specified, so cut=0.9 (default) used. fit.seed2.99 <- seedCCA(X, Y, type="seed2", cut=0.99) ## cut=0.99 used. fit.seed2.d3 <- seedCCA(X, Y, type="seed2", d=3) ## d is specified with 3. ## ux and uy specified, so proper values not suggested. fit.seed2.uxuy <- seedCCA(X, Y, type="seed2", ux=10, uy=10) plot(fit.seed2)
Returns increments (nFu) of iterative projections of a seed matrix onto a covariance matrix u
times.)
seeding(seed, covx, n, u=10)
seeding(seed, covx, n, u=10)
seed |
numeric matrix (p * d), a seed matrix |
covx |
numeric matrix (p * p), covariance matrix of X |
n |
numeric, sample sizes |
u |
numeric, maximum number of projections |
nFu |
n*Fu values |
data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) seed <- cov(X,Y) covx <- cov(X) seeding(seed, covx, n=dim(X)[1], u=4)
data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) seed <- cov(X,Y) covx <- cov(X) seeding(seed, covx, n=dim(X)[1], u=4)
Returns increments (nFu) of iterative projections of a seed matrix onto a covariance matrix upto k, which properly chosen by satisfying the terminating condition eps
(eps
can be selected by users).
seeding.auto.stop(seed, covx, n, u.max=30, eps=0.01)
seeding.auto.stop(seed, covx, n, u.max=30, eps=0.01)
seed |
numeric matrix (p * d), a seed matrix |
covx |
numeric matrix (p * p), covariance matrix of X |
n |
numeric, sample sizes |
u.max |
numeric, maximum number of projection. The default value is equal to 30. |
eps |
numeric, a value of a condition for terminating the projection. The default value is equal to 0.01. |
nFu |
n*Fu values |
u |
the number of projection properly chosen by satisfying the terminating condition |
data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) seed <- cov(X,Y) covx <- cov(X) seeding.auto.stop(seed, covx, n=dim(X)[1]) seeding.auto.stop(seed, covx, n=dim(X)[1], u.max=20, eps=0.001)
data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) seed <- cov(X,Y) covx <- cov(X) seeding.auto.stop(seed, covx, n=dim(X)[1]) seeding.auto.stop(seed, covx, n=dim(X)[1], u.max=20, eps=0.001)
Returns ordinary least squares estimates. And, the function results in subclass "seedols". For this function to work, either X or Y has to be one-dimensional. It is not necessary that X and Y should be predictors and response, respectively. Regardless of the position in the arguments, the one-dimensional and multi-dimensional variables become response and predictors, respectively.
seedols(X, Y)
seedols(X, Y)
X |
numeric vector or matrix, a first set of variables |
Y |
numeric vector or matrix, a second set of variables |
coef |
the estimated coefficients for each iterative projection upto u |
X |
X, the first set |
Y |
Y, the second set |
######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) ols1 <- seedols(X[,1:4],Y[,1]) ols2 <- seedols(Y[,1],X[,1:4]) ## ols1 and ols2 are the same results.
######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) ols1 <- seedols(X[,1:4],Y[,1]) ols2 <- seedols(Y[,1],X[,1:4]) ## ols1 and ols2 are the same results.
Returns partial least squares estimates through iterative projections. And, the function results in subclass "seedpls".
seedpls(X, Y, u=5, scale=FALSE)
seedpls(X, Y, u=5, scale=FALSE)
X |
numeric matrix (n * p), a set of predictors |
Y |
numeric vector or matrix (n * r), responses (it can be multi-dimensional) |
u |
numeric, the number of projections. The default is 5. |
scale |
logical, FALSE is default. If TRUE, each predictor is standardized with mean 0 and variance 1 |
coef |
the estimated coefficients for each iterative projection upto u |
u |
the maximum number of projections |
X |
Predictors |
Y |
Response |
scale |
status of scaling predictors |
######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) fit.pls1 <- seedpls(X,Y[,1]) ## one-dimensional response fit.pls2 <- seedpls(X,Y, u=6, scale=TRUE) ## four-dimensional response
######## data(cookie) ######## data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) fit.pls1 <- seedpls(X,Y[,1]) ## one-dimensional response fit.pls2 <- seedpls(X,Y, u=6, scale=TRUE) ## four-dimensional response
The usage of selectu
depends on one of its arguments, type
. If tyep="seed1"
, the n*F_u is computed for a higher dimension one of X and Y and a proper number of prjections is reported. For example, suppose that the dimension of X is higher than Y. Then selectu(X,Y, type="case1")
and selectu(Y, X, u=5, type="case1")
gives the same results, and it is for X. If type="seed2"
, n*F_u is computed for X and Y and proper numbers of projections for X and Y are reported. And, For type="seed2"
, num.d
must be specified. Its defualt value is 2. The argument eps
is a terminating condition for stopping projections. The projection is stopped, when the increment is less than the value of eps
. The argument auto.stop=TRUE
has the function automatically stopped as soon as the increment is less than the value of eps
. If not, the increments are computed until the value of u.max
is reached. The function selectu
results in subclass "selectu".
selectu(X, Y, type="seed2", u.max=30, auto.stop=TRUE, num.d=2, eps=0.01)
selectu(X, Y, type="seed2", u.max=30, auto.stop=TRUE, num.d=2, eps=0.01)
X |
numeric matrix (n * p), the first set of variables |
Y |
numeric matrix (n * r), the second set of variables |
type |
character, the default is "seed2". "seed1" is for the first case of Seeded CCA (One set of variable is initially-reduced). "seed2" is for the second case of Seeded CCA (Two sets of variables are initially reduced). |
u.max |
numeric, t he maximum number of u. The default is equal to 30. |
auto.stop |
logical, The default value is TRUE. If TRUE, the iterative projection is automatically stopped, when the terminaion condition eps is satisfied. If FALSE, the iterative projections are stopped at the value of u.max. |
num.d |
numeric, the number of the "num.d" largest eigenvectors of cov(first.set, second.set), if case1=FALSE. The default value is equal to 2. This option works only for type="seed2". |
eps |
numeric, the default value is equal to 0.01. A value for terminating the projection. |
The order of the values depending on type is follows:
type="seed1"
type="seed2"
type="seed1" |
Values with selecting |
nFu |
incrments (n*Fu) of the iterative projection for initally reduction one set of variable. |
proper.u |
proper value of the number of projections for X |
type |
types of seeded CCA |
eps |
a value for terminating the projection. The default value is equal to 0.01. |
type="seed2" |
Values with selecting |
nFu.x |
incrments (n*Fu) of the iterative projection for initially reducing X. |
nFu.y |
incrments (n*Fu) of the iterative projection for initially reducing Y. |
proper.ux |
proper value of the number of projections for X |
proper.uy |
proper value of the number of projections for Y |
type |
types of seeded CCA |
eps |
a value for terminating the projection. The default value is equal to 0.01. |
###### data(cookie) ###### data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) selectu(X, Y, type="seed1") selectu(X, Y, type="seed1", auto.stop=FALSE) selectu(X, Y, type="seed2", eps=0.001, num.d=3) selectu(X, Y, type="seed2", auto.stop=FALSE) ######## data(nutrimouse) ######## data(nutrimouse) Y<-as.matrix(nutrimouse$lipid) X<-as.matrix(nutrimouse$gene) selectu(X, Y, type="seed2", num.d=4) selectu(X, Y, type="seed2", num.d=4, eps=0.001) selectu(X, Y, type="seed2", auto.stop=FALSE, num.d=4, eps=0.001)
###### data(cookie) ###### data(cookie) myseq<-seq(141,651,by=2) X<-as.matrix(cookie[-c(23,61),myseq]) Y<-as.matrix(cookie[-c(23,61),701:704]) selectu(X, Y, type="seed1") selectu(X, Y, type="seed1", auto.stop=FALSE) selectu(X, Y, type="seed2", eps=0.001, num.d=3) selectu(X, Y, type="seed2", auto.stop=FALSE) ######## data(nutrimouse) ######## data(nutrimouse) Y<-as.matrix(nutrimouse$lipid) X<-as.matrix(nutrimouse$gene) selectu(X, Y, type="seed2", num.d=4) selectu(X, Y, type="seed2", num.d=4, eps=0.001) selectu(X, Y, type="seed2", auto.stop=FALSE, num.d=4, eps=0.001)