Package 'seedCCA' reference manual

Title:	Seeded Canonical Correlation Analysis
Description:	Functions for dimension reduction through the seeded canonical correlation analysis are provided. A classical canonical correlation analysis (CCA) is one of useful statistical methods in multivariate data analysis, but it is limited in use due to the matrix inversion for large p small n data. To overcome this, a seeded CCA has been proposed in Im, Gang and Yoo (2015) \doi{10.1002/cem.2691}. The seeded CCA is a two-step procedure. The sets of variables are initially reduced by successively projecting cov(X,Y) or cov(Y,X) onto cov(X) and cov(Y), respectively, without loss of information on canonical correlation analysis, following Cook, Li and Chiaromonte (2007) \doi{10.1093/biomet/asm038} and Lee and Yoo (2014) \doi{10.1111/anzs.12057}. Then, the canonical correlation is finalized with the initially-reduced two sets of variables.
Authors:	Jae Keun Yoo, Bo-Young Kim
Maintainer:	Jae Keun Yoo <[email protected]>
License:	GPL (>= 2.0)
Version:	3.1
Built:	2024-12-01 08:48:43 UTC
Source:	CRAN

Coefficients of ordinary and partial least squares through iterative projections

Description

Returns coefficients of partial least squares through iterative projections. It works only for subclasses "seedols" and seedpls".

Usage

## S3 method for class 'seedCCA'
coef(object, u=NULL,...)
## S3 method for class 'seedCCA'
coef(object, u=NULL,...)

Arguments

`object`	The name of an object of class "seedCCA"
`u`	numeric, the number of projections. The default is NULL. This option is valid for PLS alone. The option returns the coefficient estimates for u projections. For example, if it is specified at k, then the coefficient estimates with k projections are returned.
`...`	arguments passed to the coef.method

Examples

########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

fit.ols1 <- seedCCA(X[,1:4], Y[,1], type="cca")
fit.pls1 <- seedCCA(X,Y[,1],type="pls")

coef(fit.ols1)
coef(fit.pls1)
coef(fit.pls1, u=4)
########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

fit.ols1 <- seedCCA(X[,1:4], Y[,1], type="cca")
fit.pls1 <- seedCCA(X,Y[,1],type="pls")

coef(fit.ols1)
coef(fit.pls1)
coef(fit.pls1, u=4)

cookie dataset

Description

This data set contains measurements from quantitative NIR spectroscopy. The example studied arises from an experiment done to test the feasibility of NIR spectroscopy to measure the composition of biscuit dough pieces (formed but unbaked biscuits). Two similar sample sets were made up, with the standard recipe varied to provide a large range for each of the four constituents under investigation: fat, sucrose, dry flour, and water. The calculated percentages of these four ingredients represent the 4 responses. There are 40 samples in the calibration or training set (with sample 23 being an outlier) and a further 32 samples in the separate prediction or validation set (with example 21 considered as an outlier).

An NIR reflectance spectrum is available for each dough piece. The spectral data consist of 700 points measured from 1100 to 2498 nanometers (nm) in steps of 2 nm.

Usage

data(cookie)
data(cookie)

Format

A data frame of dimension 72 x 704. The first 700 columns correspond to the NIR reflectance spectrum, the last four columns correspond to the four constituents fat, sucrose, dry flour, and water. The first 40 rows correspond to the calibration data, the last 32 rows correspond to the prediction data.

References

Please cite the following papers if you use this data set.

P.J. Brown, T. Fearn, and M. Vannucci (2001) Bayesian Wavelet Regression on Curves with Applications to a Spectroscopic Calibration Problem. Journal of the American Statistical Association, 96, pp. 398-408.

B.G. Osborne, T. Fearn, A.R. Miller, and S. Douglas (1984) Application of Near-Infrared Reflectance Spectroscopy to Compositional Analysis of Biscuits and Biscuit Dough. Journal of the Science of Food and Agriculture, 35, pp. 99 - 105.

Examples

    data(cookie) # load data
    X<-as.matrix(cookie[,1:700]) # extract NIR spectra
    Y<-as.matrix(cookie[,701:704]) # extract constituents
    Xtrain<-X[1:40,] # extract training data
    Ytrain<-Y[1:40,] # extract training data
    Xtest<-X[41:72,] # extract test data
    Ytest<-Y[41:72,] # extract test data
data(cookie) # load data
    X<-as.matrix(cookie[,1:700]) # extract NIR spectra
    Y<-as.matrix(cookie[,701:704]) # extract constituents
    Xtrain<-X[1:40,] # extract training data
    Ytrain<-Y[1:40,] # extract training data
    Xtest<-X[41:72,] # extract test data
    Ytest<-Y[41:72,] # extract test data

scree-ploting cov(X, Y)

Description

Returns a scree-plot of the eigenvalues of cov(first.set, second.set) to select its first d largest eigenvectors.

Usage

covplot(X, Y, mind=NULL)covplot(X, Y, mind=NULL)

Arguments

`X`	numeric matrix (n * p), X
`Y`	numeric matrix (n * r), Y
`mind`	numeric, the number of the eigenvalues to show their cumulative percentages. The default is NULL, and then it is equal to min(p,r)

Value

`eigenvalues`	the ordiered eigenvalues of cov(X,Y)
`cum.percent`	the cumulative percentages of the eigenvalues
`num.evecs`	a vector of the numbers of the eigenvectors which forces the cumulative percentages bigger than 0.6, 0.7, 0.8, 0.9

Examples

data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])
covplot(X, Y)
covplot(X, Y, mind=4)
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])
covplot(X, Y)
covplot(X, Y, mind=4)

finalized CCA in seeded CCA

Description

Returns the results of the finalized step in seeded CCA.

Usage

finalCCA(X, Y)finalCCA(X, Y)

Arguments

`X`	numeric matrix (n * d), the initially-CCAed first set of variables
`Y`	numeric matrix (n * d), the initially-CCAed second set of variables

Value

`cor`	canonical correlations in finalized step
`xcoef`	the estimated canonical coefficient matrix of the initially-CCAed first set of variables
`ycoef`	the estimated canonical coefficient matrix of the initially-CCAed second set of variables
`Xscores`	the finalized canonical variates of the first set of variables
`Yscores`	the finalized canonical variates of the second set of variables

Examples

########  data(cookie) ########
data(cookie)
myseq <- seq(141, 651, by=2)
X <- as.matrix(cookie[-c(23,61), myseq])
Y <- as.matrix(cookie[-c(23,61), 701:704])
min.pr <- min( dim(X)[2], dim(Y)[2])
MX0 <- iniCCA(X, Y, u=4, num.d=min.pr)
ini.X <- X %*% MX0
finalCCA(ini.X, Y)

########  data(nutrimouse) ########
data(nutrimouse)
Y<-as.matrix(nutrimouse$lipid)
X<-as.matrix(nutrimouse$gene)
MX0 <- iniCCA(X, Y, u=4, num.d=4)
MY0 <- iniCCA(Y, X, u=5, num.d=4)
ini.X <- X %*% MX0
ini.Y <- Y %*% MY0
finalCCA(ini.X, ini.Y)
########  data(cookie) ########
data(cookie)
myseq <- seq(141, 651, by=2)
X <- as.matrix(cookie[-c(23,61), myseq])
Y <- as.matrix(cookie[-c(23,61), 701:704])
min.pr <- min( dim(X)[2], dim(Y)[2])
MX0 <- iniCCA(X, Y, u=4, num.d=min.pr)
ini.X <- X %*% MX0
finalCCA(ini.X, Y)

########  data(nutrimouse) ########
data(nutrimouse)
Y<-as.matrix(nutrimouse$lipid)
X<-as.matrix(nutrimouse$gene)
MX0 <- iniCCA(X, Y, u=4, num.d=4)
MY0 <- iniCCA(Y, X, u=5, num.d=4)
ini.X <- X %*% MX0
ini.Y <- Y %*% MY0
finalCCA(ini.X, ini.Y)

Fitted values of ordinary and partial least squares

Description

Returns fitted values of ordinary and partial least squares through iterative projections. It works only for subclasses "seedols" and "seedpls".

Usage

## S3 method for class 'seedCCA'
fitted(object, u=NULL,...)
## S3 method for class 'seedCCA'
fitted(object, u=NULL,...)

Arguments

`object`	The name of an object of class "seedCCA"
`u`	numeric, the number of projections. The default is NULL. This option is valid for PLS alone. The option returns the fitted values for u projections. For example, if it is specified at k, then the fitted values with k projections are returned.
`...`	arguments passed to the fitted.method

Examples

########  data(cookie) ########
########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

fit.ols1 <- seedCCA(X[,1:4], Y[,1], type="cca")
fit.pls1 <- seedCCA(X, Y[,1], type="pls")
fit.pls2 <- seedCCA(X, Y[,1], type="pls", scale=TRUE)

fitted(fit.ols1)
fitted(fit.pls1)
fitted(fit.pls1, u=4)
fitted(fit.pls2, u=4)
########  data(cookie) ########
########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

fit.ols1 <- seedCCA(X[,1:4], Y[,1], type="cca")
fit.pls1 <- seedCCA(X, Y[,1], type="pls")
fit.pls2 <- seedCCA(X, Y[,1], type="pls", scale=TRUE)

fitted(fit.ols1)
fitted(fit.pls1)
fitted(fit.pls1, u=4)
fitted(fit.pls2, u=4)

Initialized CCA in seeded CCA

Description

Returns the canonical coefficient matrices from the initialized step in seeded CCA. The initialzied CCA is done only for the first set in its first argument. The num.d must be less than or equal to the dimension of the second set.

Usage

iniCCA(X, Y, u, num.d)iniCCA(X, Y, u, num.d)

Arguments

`X`	numeric matrix (n * p), the first set of variables: this set of variables alone is reduced.
`Y`	numeric matrix (n * r), the second set of variables
`u`	numeric, the terminiating index of the projection
`num.d`	numeric, the first "num.d" eigenvectors of cov(X,Y) to replace cov(X,Y), if min(p,r) relatively bigger than n. The `num.d` must be less than or equal to min(p,r).

Value

`B`	the initialized CCAed coefficient matrix projected by the value of `u`

Examples

########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])
min.pr <- min( dim(X)[2], dim(Y)[2])
MX0 <- iniCCA(X, Y, u=4, num.d=min.pr)
ini.X <- X%*%MX0

########  data(nutrimouse) ########
data(nutrimouse)
Y<-as.matrix(nutrimouse$lipid)
X<-as.matrix(nutrimouse$gene)
MX0 <- iniCCA(X, Y, u=4, num.d=4)
MY0 <- iniCCA(Y, X, u=5, num.d=4)
ini.X <- X %*% MX0
ini.Y <- Y %*% MY0
########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])
min.pr <- min( dim(X)[2], dim(Y)[2])
MX0 <- iniCCA(X, Y, u=4, num.d=min.pr)
ini.X <- X%*%MX0

########  data(nutrimouse) ########
data(nutrimouse)
Y<-as.matrix(nutrimouse$lipid)
X<-as.matrix(nutrimouse$gene)
MX0 <- iniCCA(X, Y, u=4, num.d=4)
MY0 <- iniCCA(Y, X, u=5, num.d=4)
ini.X <- X %*% MX0
ini.Y <- Y %*% MY0

Nutrimouse dataset

Description

The nutrimouse dataset comes from a nutrition study in the mouse. It was provided by Pascal Martin from the Toxicology and Pharmacology Laboratory (French National Institute for Agronomic Research).

Usage

data(nutrimouse)
data(nutrimouse)

Format

gene: data frame (40 * 120) with numerical variables

lipid: data frame (40 * 21) with numerical variables

diet: factor vector (40)

genotype: factor vector (40)

Details

Two sets of variables were measured on 40 mice:

expressions of 120 genes potentially involved in nutritional problems.

concentrations of 21 hepatic fatty acids: The 40 mice were distributed in a 2-factors experimental design (4 replicates).

Genotype (2-levels factor): wild-type and PPARalpha -/-

Diet (5-levels factor): Oils used for experimental diets preparation were corn and colza oils (50/50) for a reference diet (REF), hydrogenated coconut oil for a saturated fatty acid diet (COC), sunflower oil for an Omega6 fatty acid-rich diet (SUN), linseed oil for an Omega3-rich diet (LIN) and corn/colza/enriched fish oils for the FISH diet (43/43/14).

References

P. Martin, H. Guillou, F. Lasserre, S. D??jean, A. Lan, J-M. Pascussi, M. San Cristobal, P. Legrand, P. Besse, T. Pineau (2007) Novel aspects of PPARalpha-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology, 45, 767???777

Examples

data(nutrimouse)
boxplot(nutrimouse$lipid)
data(nutrimouse)
boxplot(nutrimouse$lipid)

Plotting class "seedCCA" depending on the value of `type`

Description

The function is for plotting class "seedCCA". Depending on subclass defined by the value of type, its resulting plot is different.

Usage

## S3 method for class 'seedCCA'
plot(x, ref=90, eps=0.01, ...)
## S3 method for class 'seedCCA'
plot(x, ref=90, eps=0.01, ...)

Arguments

`x`	The name of an object of class "seedCCA"
`ref`	numeric, the value for reference line. It must be chosen between 0 and 100. It works only for subclass "finalCCA".
`eps`	numeric, a value to terminate projections. It must be chosen between 0 and 1. The default is equal to 0.01. It works only for subclass "seedpls".
`...`	arguments passed to the plot.method

Details

subclass "finalCCA": the function makes a plot for percents of cumulative canonical correlations.

subclass "seedpls": the function returns a proper number of projections and plot of the projection increment against the number of projections. A proper number of projections is indicated with a blue-color vertical bar in the plot. Only for subclass "seedpls", the output is retured. See Value part.

subclass "seedols": No plotting

subclass "selectu": the function makes a plot for increment of iterative projections by the output of subclass "selectu".

Value

`proper.u`	proper value of the number of projections
`nFu`	incrments (n*Fu) of the iterative projection.
`u`	the maximum number of projections from "seedpls" object
`eps`	a value for terminating the projection. The default value is equal to 0.01.

Examples

########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

fit.cca <- seedCCA(X[,1:4],Y[,1:4],type="cca")
fit.seed1 <- seedCCA(X,Y, type="seed1")
fit.pls1 <- seedCCA(X,Y[,1],type="pls")
fit.selu <- selectu(X,Y, type="seed2")

plot(fit.cca)
plot(fit.seed1, ref=95)
plot(fit.pls1)
plot(fit.pls1, eps=0.00001)
plot(fit.selu)
########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

fit.cca <- seedCCA(X[,1:4],Y[,1:4],type="cca")
fit.seed1 <- seedCCA(X,Y, type="seed1")
fit.pls1 <- seedCCA(X,Y[,1],type="pls")
fit.selu <- selectu(X,Y, type="seed2")

plot(fit.cca)
plot(fit.seed1, ref=95)
plot(fit.pls1)
plot(fit.pls1, eps=0.00001)
plot(fit.selu)

Projection of a seed matrix on to the column subspace of M with respect to `Sx` inner-product

Description

The function reuturns a projection of a seed matrix on to the column subspace of M with respect to Sx inner-product.

Usage

Pm(M, Sx, seed)
Pm(M, Sx, seed)

Arguments

`M`	numeric matrix (p * k), a basis matrix of the column space of M
`Sx`	a inner-product matrix
`seed`	seed matrix

Examples

data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

## using cov(X,Y) as a seed matrix
seed <- cov(X,Y)
col.num <- dim(seed)[2]
M <- iniCCA(X, Y, u=4, num.d=col.num)
Sx <- cov(X)
Pm(M, Sx, seed)

## using the first 2 largest eigenvectors of cov(X,Y) as a seed matrix
seed2 <- svd(cov(X,Y))$u[,1:2]
M2 <- iniCCA(X, Y, u=4, num.d=2)
Pm(M, Sx, seed2)
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

## using cov(X,Y) as a seed matrix
seed <- cov(X,Y)
col.num <- dim(seed)[2]
M <- iniCCA(X, Y, u=4, num.d=col.num)
Sx <- cov(X)
Pm(M, Sx, seed)

## using the first 2 largest eigenvectors of cov(X,Y) as a seed matrix
seed2 <- svd(cov(X,Y))$u[,1:2]
M2 <- iniCCA(X, Y, u=4, num.d=2)
Pm(M, Sx, seed2)

basic function for printing class "seedCCA"

Description

The function controls to print class "seedCCA". The function prints the estimated coefficents, if they exist. For subclass "finalCCA", canonical correlations are additionally reported. For subsclass "selectu", increments, suggested number of projections and the values of type and eps are reported.

Usage

## S3 method for class 'seedCCA'
print(x,...)
## S3 method for class 'seedCCA'
print(x,...)

Arguments

`x`	The name of an object of class "seedCCA"
`...`	arguments passed to the print.method

Examples

########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

fit.seed2 <- seedCCA(X,Y)
fit.seed2
print(fit.seed2)
########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

fit.seed2 <- seedCCA(X,Y)
fit.seed2
print(fit.seed2)

Seeded Canonical correlation analysis

Description

The function seedCCA is mainly for implementing seeded canonical correlation analysis proposed by Im et al. (2015). The function conducts the following four methods, depending on the value of type. The option type has one of c("cca", "seed1", "seed2", "pls").

Usage

seedCCA(X,Y,type="seed2",ux=NULL,uy=NULL,u=10,eps=0.01,cut=0.9,d=NULL,AS=TRUE,scale=FALSE)
seedCCA(X,Y,type="seed2",ux=NULL,uy=NULL,u=10,eps=0.01,cut=0.9,d=NULL,AS=TRUE,scale=FALSE)

Arguments

`X`	numeric vector or matrix (n * p), the first set of variables
`Y`	numeric vector or matrix (n * r), the second set of variables
`type`	character, a choice of methods among `c("cca", "seed1", "seed2", "pls")`. The default is `"seed2"`.
`ux`	numeric, maximum number of projections for X. The default is NULL. If this is not NULL, it surpasses the option `u` with `type="seed1"` and p>r. For `type="seed2"`, if `ux` and `uy` are not NULL, they surpass `u`.
`uy`	numeric, maximum number of projections for Y. The default is NULL. If this is not NULL, it surpasses the option `u` with `type="seed1"` and p<r. For `type="seed2"`, if `ux` and `uy` are not NULL, they surpass `u`.
`u`	numeric, maximum number of projections. The default is 10. This is used for `type="seed1"`, `type="seed2"` and tyepe="pls".
`eps`	numeric, the criteria to terminate iterative projections. The default is 0.01. If increment of projections is less than `eps`, then the iterative projection is terminated.
`cut`	numeric, between 0 and 1. The default is 0.9. If `d` is NULL, `cut` is used for automatic replacements of cov(X,Y) and cov(Y,X) with their eigenvectors, depending on the value of `cut`. So, if any value of `d` is given, `cut` is not effective. cov(X,Y) and cov(Y,X) are replaced with their largest eigenvectors, whose cumulative eigenvalue proportion is bigger than the value of `cut`. This only works for `type="seed2"`.
`d`	numeric, the user-selected number of largest eigenvectors of cov(X, Y) and cov(Y, X). The default is NULL. This only works for `type="seed2"`. If any value of `d` is given, `cut` does not work.
`AS`	logical, status of automatic stop of projections. The default is `TRUE`. If TRUE, the iterative projection is automatically stopped, when the terminaion condition `eps` is satisfied. If`AS=FALSE`, the iterative projections are stopped at the value of `u`.
`scale`	logical. scaling predictors to have zero mean and one standard deviation. The default is `FALSE`. If `scale=TRUE`, each predictor is scaled with mean 0 and variance 1 for partial least squares. This option works only for `type="pls"`.

Details

Let p and r stand for the numbers of variables in the two sets and n stands for the sample size. The option of type="cca" can work only when max(p,r) < n, and seedCCA conducts standard canonical correlation analysis (Johnson and Wichern, 2007). If type="cca" is given and either p or r is equal to one, ordinary least squares (OLS) is done instead of canonical correlation analysis. If max(p,r) >= n, either type="seed1" or type="seed2" has to be chosen. This is the main purpose of seedCCA. If type="seed1", only one set of variables, saying X with p for convenience, to have more variables than the other, saying Y with r, is initially reduced by the iterative projection approach (Cook et al. 2007). And then, the canonical correlation analysis of the initially-reduced X and the original Y is finalized. If type="seed2", both X and Y are initially reduced. And then, the canonical correlation analysis of the two initially-reduced X and Y are finalzed. If type="pls", partial least squares (PLS) is done. If type="pls" is given, the first set of variables in seedCCA is predictors and the second set is response. This matters The response can be multivariate. Depeding on the value of type, the resulted subclass by seedCCA are different.:

type="cca": subclass "finalCCA" (p >2; r >2; p,r<n)

type="cca": subclass "seedols" (either p or r is equal to 1.)

type="seed1" and type="seed2": subclass "finalCCA" (max(p,r)>n)

type="pls": subclass "seedpls" (p>n and r <n)

So, plot(object) will result in different figure depending on the object.

The order of the values depending on type is follows.:

type="cca": standard CCA (max(p,r)<n, min(p,r)>1) / "finalCCA" subclass

type="cca": ordinary least squares (max(p,r)<n, min(p,r)=1) / "seedols" subclass

type="seed1": seeded CCA with case1 (max(p,r)>n and p>r) / "finalCCA" subclass

type="seed1": seeded CCA with case1 (max(p,r)>n and p<=r) / "finalCCA" subclass

type="seed2": seeded CCA with case2 (max(p,r)>n) / "finalCCA" subclass

type="pls": partial least squares (p>n and r<n) / "seedpls" subclass

Value

`type="cca"`	Values with selecting `type="cca"`: standard CCA (max(p,r)<n, min(p,r)>1) / "finalCCA" subclass
`cor`	canonical correlations
`xcoef`	the estimated canonical coefficients for X
`ycoef`	the estimated canonical coefficients for Y
`Xscores`	the estimated canonical variates for X
`Yscores`	the estimated canonical variates for Y
`type="cca"`	Values with selecting `type="cca"`: ordinary least squares (max(p,r)<n, min(p,r)=1) / "seedols" subclass
`coef`	the estimated ordinary least squares coefficients
`X`	X, the first set
`Y`	Y, the second set
`type="seed1"`	Values with selecting `type="seed1"`: seeded CCA with case1 (max(p,r)>n and p>r) / "finalCCA" subclass
`cor`	canonical correlations
`xcoef`	the estimated canonical coefficients for X
`ycoef`	the estimated canonical coefficients for Y
`proper.u`	a suggested proper number of projections for X
`initialMX0`	the initialized canonical coefficient matrices of X
`newX`	initially-reduced X
`Y`	the original Y
`Xscores`	the estimated canonical variates for X
`Yscores`	the estimated canonical variates for Y
`type="seed1"`	Values with selecting `type="seed1"`: seeded CCA with case1 (max(p,r)>n and p<=r) / "finalCCA" subclass)
`cor`	canonical correlations
`xcoef`	the estimated canonical coefficients for X
`ycoef`	the estimated canonical coefficients for Y
`proper.u`	a suggested proper number of projections for Y
`X`	the original X
`initialMY0`	the initialized canonical coefficient matrices of Y
`newY`	initially-reduced Y
`Xscores`	the estimated canonical variates for X
`Yscores`	the estimated canonical variates for Y
`type="seed2"`	Values with selecting `type="seed2"`: seeded CCA with case2 (max(p,r)>n) / "finalCCA" subclass
`cor`	canonical correlations
`xcoef`	the estimated canonical coefficients for X
`ycoef`	the estimated canonical coefficients for Y
`proper.ux`	a suggested proper number of projections for X
`proper.uy`	a suggested proper number of projections for Y
`d`	suggested number of eigenvectors of cov(X,Y)
`initialMX0`	the initialized canonical coefficient matrices of X
`initialMY0`	the initialized canonical coefficient matrices of Y
`newX`	initially-reduced X
`newY`	initially-reduced Y
`Xscores`	the estimated canonical variates for X
`Yscores`	the estimated canonical variates for Y
`type="pls"`	Values with selecting `type="pls"`:: partial least squares (p>n and r<n) / "seedpls" subclass
`coef`	the estimated coefficients for each iterative projection upto u
`u`	the maximum number of projections
`X`	predictors
`Y`	response
`scale`	status of scaling predictors
`cases`	the number of observations

References

R. D. Cook, B. Li and F. Chiaromonte. Dimension reduction in regression without matrix inversion. Biometrika 2007; 94: 569-584.

Y. Im, H. Gang and JK. Yoo. High-throughput data dimension reduction via seeded canonical correlation analysis, J. Chemometrics 2015; 29: 193-199.

R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analysis. Pearson Prentice Hall: New Jersey, USA; 6 edition. 2007; 539-574.

K. Lee and JK. Yoo. Canonical correlation analysis through linear modeling, AUST. NZ. J. STAT. 2014; 56: 59-72.

Examples

######  data(cookie) ######
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])
dim(X);dim(Y)

## standard CCA
fit.cca <-seedCCA(X[,1:4], Y, type="cca")  ## standard canonical correlation analysis is done.
plot(fit.cca)

## ordinary least squares
fit.ols1 <-seedCCA(X[,1:4], Y[,1], type="cca")  ## ordinary least squares is done, because r=1.
fit.ols2 <-seedCCA(Y[,1], X[,1:4], type="cca")  ## ordinary least squares is done, because p=1.

## seeded CCA with case 1
fit.seed1 <- seedCCA(X, Y, type="seed1") ## suggested proper value of u is equal to 3.
fit.seed1.ux <- seedCCA(X, Y, ux=6, type="seed1") ## iterative projections done 6 times.
fit.seed1.uy <- seedCCA(Y, X, uy=6, type="seed1", AS=FALSE)  ## projections not done until uy=6.
plot(fit.seed1)

## partial least squares
fit.pls1 <- seedCCA(X, Y[,1], type="pls")
fit.pls.m <- seedCCA(X, Y, type="pls") ## multi-dimensional response
par(mfrow=c(1,2))
plot(fit.pls1); plot(fit.pls.m)


########  data(nutrimouse) ########
data(nutrimouse)
X<-as.matrix(nutrimouse$gene)
Y<-as.matrix(nutrimouse$lipid)
dim(X);dim(Y)

## seeded CCA with case 2
fit.seed2 <- seedCCA(X, Y, type="seed2")  ## d not specified, so cut=0.9 (default) used.
fit.seed2.99 <- seedCCA(X, Y, type="seed2", cut=0.99)  ## cut=0.99 used.
fit.seed2.d3 <- seedCCA(X, Y, type="seed2", d=3)  ## d is specified with 3.

## ux and uy specified, so proper values not suggested.
fit.seed2.uxuy <- seedCCA(X, Y, type="seed2", ux=10, uy=10)
plot(fit.seed2)
######  data(cookie) ######
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])
dim(X);dim(Y)

## standard CCA
fit.cca <-seedCCA(X[,1:4], Y, type="cca")  ## standard canonical correlation analysis is done.
plot(fit.cca)

## ordinary least squares
fit.ols1 <-seedCCA(X[,1:4], Y[,1], type="cca")  ## ordinary least squares is done, because r=1.
fit.ols2 <-seedCCA(Y[,1], X[,1:4], type="cca")  ## ordinary least squares is done, because p=1.

## seeded CCA with case 1
fit.seed1 <- seedCCA(X, Y, type="seed1") ## suggested proper value of u is equal to 3.
fit.seed1.ux <- seedCCA(X, Y, ux=6, type="seed1") ## iterative projections done 6 times.
fit.seed1.uy <- seedCCA(Y, X, uy=6, type="seed1", AS=FALSE)  ## projections not done until uy=6.
plot(fit.seed1)

## partial least squares
fit.pls1 <- seedCCA(X, Y[,1], type="pls")
fit.pls.m <- seedCCA(X, Y, type="pls") ## multi-dimensional response
par(mfrow=c(1,2))
plot(fit.pls1); plot(fit.pls.m)


########  data(nutrimouse) ########
data(nutrimouse)
X<-as.matrix(nutrimouse$gene)
Y<-as.matrix(nutrimouse$lipid)
dim(X);dim(Y)

## seeded CCA with case 2
fit.seed2 <- seedCCA(X, Y, type="seed2")  ## d not specified, so cut=0.9 (default) used.
fit.seed2.99 <- seedCCA(X, Y, type="seed2", cut=0.99)  ## cut=0.99 used.
fit.seed2.d3 <- seedCCA(X, Y, type="seed2", d=3)  ## d is specified with 3.

## ux and uy specified, so proper values not suggested.
fit.seed2.uxuy <- seedCCA(X, Y, type="seed2", ux=10, uy=10)
plot(fit.seed2)

increments of iterative projections

Description

Returns increments (nFu) of iterative projections of a seed matrix onto a covariance matrix u times.)

Usage

seeding(seed, covx, n,  u=10)
seeding(seed, covx, n,  u=10)

Arguments

`seed`	numeric matrix (p * d), a seed matrix
`covx`	numeric matrix (p * p), covariance matrix of X
`n`	numeric, sample sizes
`u`	numeric, maximum number of projections

Value

nFu

n*Fu values

Examples

data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])
seed <- cov(X,Y)
covx <- cov(X)
seeding(seed, covx, n=dim(X)[1], u=4)
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])
seed <- cov(X,Y)
covx <- cov(X)
seeding(seed, covx, n=dim(X)[1], u=4)

increments of iterative projections with automatic stopping

Description

Returns increments (nFu) of iterative projections of a seed matrix onto a covariance matrix upto k, which properly chosen by satisfying the terminating condition eps (eps can be selected by users).

Usage

seeding.auto.stop(seed, covx, n, u.max=30, eps=0.01)
seeding.auto.stop(seed, covx, n, u.max=30, eps=0.01)

Arguments

`seed`	numeric matrix (p * d), a seed matrix
`covx`	numeric matrix (p * p), covariance matrix of X
`n`	numeric, sample sizes
`u.max`	numeric, maximum number of projection. The default value is equal to 30.
`eps`	numeric, a value of a condition for terminating the projection. The default value is equal to 0.01.

Value

`nFu`	n*Fu values
`u`	the number of projection properly chosen by satisfying the terminating condition `eps`

Examples

data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])
seed <- cov(X,Y)
covx <- cov(X)
seeding.auto.stop(seed, covx, n=dim(X)[1])
seeding.auto.stop(seed, covx, n=dim(X)[1], u.max=20, eps=0.001)
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])
seed <- cov(X,Y)
covx <- cov(X)
seeding.auto.stop(seed, covx, n=dim(X)[1])
seeding.auto.stop(seed, covx, n=dim(X)[1], u.max=20, eps=0.001)

Ordinary least squares

Description

Returns ordinary least squares estimates. And, the function results in subclass "seedols". For this function to work, either X or Y has to be one-dimensional. It is not necessary that X and Y should be predictors and response, respectively. Regardless of the position in the arguments, the one-dimensional and multi-dimensional variables become response and predictors, respectively.

Usage

seedols(X, Y)seedols(X, Y)

Arguments

`X`	numeric vector or matrix, a first set of variables
`Y`	numeric vector or matrix, a second set of variables

Value

`coef`	the estimated coefficients for each iterative projection upto u
`X`	X, the first set
`Y`	Y, the second set

Examples

########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

ols1 <- seedols(X[,1:4],Y[,1])
ols2 <- seedols(Y[,1],X[,1:4])
## ols1 and ols2 are the same results.
########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

ols1 <- seedols(X[,1:4],Y[,1])
ols2 <- seedols(Y[,1],X[,1:4])
## ols1 and ols2 are the same results.

Partial least squares through iterative projections

Description

Returns partial least squares estimates through iterative projections. And, the function results in subclass "seedpls".

Usage

seedpls(X, Y, u=5, scale=FALSE)seedpls(X, Y, u=5, scale=FALSE)

Arguments

`X`	numeric matrix (n * p), a set of predictors
`Y`	numeric vector or matrix (n * r), responses (it can be multi-dimensional)
`u`	numeric, the number of projections. The default is 5.
`scale`	logical, FALSE is default. If TRUE, each predictor is standardized with mean 0 and variance 1

Value

`coef`	the estimated coefficients for each iterative projection upto u
`u`	the maximum number of projections
`X`	Predictors
`Y`	Response
`scale`	status of scaling predictors

Examples

########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

fit.pls1 <- seedpls(X,Y[,1]) ## one-dimensional response
fit.pls2 <- seedpls(X,Y, u=6, scale=TRUE) ## four-dimensional response

########  data(cookie) ########
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

fit.pls1 <- seedpls(X,Y[,1]) ## one-dimensional response
fit.pls2 <- seedpls(X,Y, u=6, scale=TRUE) ## four-dimensional response

Function that guides a selection of the terminating index when using seedCCA function

Description

The usage of selectu depends on one of its arguments, type. If tyep="seed1", the n*F_u is computed for a higher dimension one of X and Y and a proper number of prjections is reported. For example, suppose that the dimension of X is higher than Y. Then selectu(X,Y, type="case1") and selectu(Y, X, u=5, type="case1") gives the same results, and it is for X. If type="seed2", n*F_u is computed for X and Y and proper numbers of projections for X and Y are reported. And, For type="seed2", num.d must be specified. Its defualt value is 2. The argument eps is a terminating condition for stopping projections. The projection is stopped, when the increment is less than the value of eps. The argument auto.stop=TRUE has the function automatically stopped as soon as the increment is less than the value of eps. If not, the increments are computed until the value of u.max is reached. The function selectu results in subclass "selectu".

Usage

selectu(X, Y, type="seed2", u.max=30, auto.stop=TRUE, num.d=2, eps=0.01)
selectu(X, Y, type="seed2", u.max=30, auto.stop=TRUE, num.d=2, eps=0.01)

Arguments

`X`	numeric matrix (n * p), the first set of variables
`Y`	numeric matrix (n * r), the second set of variables
`type`	character, the default is "seed2". "seed1" is for the first case of Seeded CCA (One set of variable is initially-reduced). "seed2" is for the second case of Seeded CCA (Two sets of variables are initially reduced).
`u.max`	numeric, t he maximum number of u. The default is equal to 30.
`auto.stop`	logical, The default value is TRUE. If TRUE, the iterative projection is automatically stopped, when the terminaion condition eps is satisfied. If FALSE, the iterative projections are stopped at the value of u.max.
`num.d`	numeric, the number of the "num.d" largest eigenvectors of cov(first.set, second.set), if case1=FALSE. The default value is equal to 2. This option works only for type="seed2".
`eps`	numeric, the default value is equal to 0.01. A value for terminating the projection.

Details

The order of the values depending on type is follows:

type="seed1"

type="seed2"

Value

`type="seed1"`	Values with selecting `type="seed1"`:
`nFu`	incrments (n*Fu) of the iterative projection for initally reduction one set of variable.
`proper.u`	proper value of the number of projections for X
`type`	types of seeded CCA
`eps`	a value for terminating the projection. The default value is equal to 0.01.
`type="seed2"`	Values with selecting `type="seed2"`:
`nFu.x`	incrments (n*Fu) of the iterative projection for initially reducing X.
`nFu.y`	incrments (n*Fu) of the iterative projection for initially reducing Y.
`proper.ux`	proper value of the number of projections for X
`proper.uy`	proper value of the number of projections for Y
`type`	types of seeded CCA
`eps`	a value for terminating the projection. The default value is equal to 0.01.

Examples

######  data(cookie) ######
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

selectu(X, Y, type="seed1")
selectu(X, Y, type="seed1", auto.stop=FALSE)
selectu(X, Y, type="seed2", eps=0.001, num.d=3)
selectu(X, Y, type="seed2", auto.stop=FALSE)



########  data(nutrimouse) ########
data(nutrimouse)
Y<-as.matrix(nutrimouse$lipid)
X<-as.matrix(nutrimouse$gene)
selectu(X, Y, type="seed2", num.d=4)
selectu(X, Y, type="seed2", num.d=4, eps=0.001)
selectu(X, Y, type="seed2", auto.stop=FALSE, num.d=4, eps=0.001)
######  data(cookie) ######
data(cookie)
myseq<-seq(141,651,by=2)
X<-as.matrix(cookie[-c(23,61),myseq])
Y<-as.matrix(cookie[-c(23,61),701:704])

selectu(X, Y, type="seed1")
selectu(X, Y, type="seed1", auto.stop=FALSE)
selectu(X, Y, type="seed2", eps=0.001, num.d=3)
selectu(X, Y, type="seed2", auto.stop=FALSE)



########  data(nutrimouse) ########
data(nutrimouse)
Y<-as.matrix(nutrimouse$lipid)
X<-as.matrix(nutrimouse$gene)
selectu(X, Y, type="seed2", num.d=4)
selectu(X, Y, type="seed2", num.d=4, eps=0.001)
selectu(X, Y, type="seed2", auto.stop=FALSE, num.d=4, eps=0.001)

Package 'seedCCA'

Help Index

Coefficients of ordinary and partial least squares through iterative projections

Description

Usage

Arguments

Examples

cookie dataset

Description

Usage

Format

References

Examples

scree-ploting cov(X, Y)

Description

Usage

Arguments

Value

Examples

finalized CCA in seeded CCA

Description

Usage

Arguments

Value

Examples

Fitted values of ordinary and partial least squares

Description

Usage

Arguments

Examples

Initialized CCA in seeded CCA

Description

Usage

Arguments

Value

Examples

Nutrimouse dataset

Description

Usage

Format

Details

References

Examples

Plotting class "seedCCA" depending on the value of type

Description

Usage

Arguments

Details

Value

Examples

Projection of a seed matrix on to the column subspace of M with respect to Sx inner-product

Description

Usage

Arguments

Examples

basic function for printing class "seedCCA"

Description

Usage

Arguments

Examples

Seeded Canonical correlation analysis

Description

Usage

Arguments

Details

Value

References

Examples

increments of iterative projections

Description

Usage

Arguments

Value

Examples

increments of iterative projections with automatic stopping

Description

Usage

Arguments

Value

Examples

Plotting class "seedCCA" depending on the value of `type`

Projection of a seed matrix on to the column subspace of M with respect to `Sx` inner-product