Multi-block Approach

2024-12-26 @Atsushi Kawaguchi

In this vignette, the output is omitted. Please refer to the following book for the output.

Kawaguchi A. (2021). Multivariate Analysis for Neuroimaging Data. CRC Press.

Multi-block PCA

Generate simulation data

The data is generated by the strsimdata function. The function generates data by applying a zero-weighted load to randomly generated factors.

n = 100; seed = 2
dataset1 = strsimdata(n = n, ncomp=2, 
Xps=c(4,4), Ztype="binary", seed=seed)

The number of subjects is 100, the number of factors is 2, the generated explanatory variable matrix X has 2 blocks and the number of variables is 4 and 4, respectively. Thus, the number of blocks is set by the length of the vector that specifies the number of variables. Also, set whether to generate the supervisor vector Z. Multi-block data is a list of data matrices.

X2 = dataset1$X; 
Z = dataset1$Z
str(dataset1[c("X","Z")])

The weights for X are generated by normalizing normal random numbers so that their length is 1 and they are stored as follows.

dataset1$WX

The first element of the list has a super weight and the second element has a block weight. The block weight corresponds to a component in the row and the column corresponds to the number of variables.

The numbers of zero weights are as follows.

dataset1$nZeroX
dataset1$ZcoefX

Supervised multi-block PCA

Perform supervised multi-block PCA by setting not only X2 but also Z and supervised parameter muX. First, select the number of components and the regularized parameter.

(opt212 = optparasearch(X2, Z=Z, muX=0.5, 
search.method = "ncomp1st", criterion="BIC", whichselect="X"))

Perform supervised multi-block PCA using msma function using the selected number of components and regularized parameters.

(fit212 = msma(X2, Z=Z, muX=0.5, comp=opt212$optncomp, 
lambdaX=opt212$optlambdaX))

The results of the first and the second components are as follows.

par(mfrow=c(2,2), oma = c(0, 0, 2, 0))
plot(fit212, axes = 1, plottype="bar", 
block="super", XY="X")
plot(fit212, axes = 2, plottype="bar", 
block="super", XY="X")
plot(fit212, axes = 1, plottype="bar", 
block="block", XY="X")
plot(fit212, axes = 2, plottype="bar", 
block="block", XY="X")

The relationship between the super score and the binary outcome Z is examined.

par(mfrow=c(1,2))
for(i in 1:2){
t1=t.test(fit212$ssX[,i]~Z)
boxplot(fit212$ssX[,i]~Z, 
main=paste("Comp", i),
sub=paste("t-test p =", round(t1$p.value,4)))
}

PLS

Generate simulation data

The data is generated by the strsimdata function. The function generates data by applying a zero-weighted load to randomly generated factors.

dataset2 = strsimdata(n = n, ncomp=2, Xps=c(4,4), 
Yps=c(3,5), Ztype="binary", cz=c(10,10), seed=seed)

The number of subjects is 100, the number of factors is 2, the generated explanatory variable matrix X has 2 blocks and the number of variables is 4 and 5, respectively. Thus, the number of blocks is set by the length of the vector that specifies the number of variables. The same is true for the objective variable matrix Y. Also, set whether to generate the supervisor vector Z. Here, it is generated.

Multi-block data is a list of data matrices.

X2 = dataset2$X; Y2 = dataset2$Y 
Z = dataset2$Z
str(dataset2[c("X","Y","Z")])

The weights for X are generated by normalizing normal random numbers so that their length is 1 and they are stored as follows.

dataset2$WX

The first element of the list has a super weight and the second element has a block weight. The block weight corresponds to a component in the row and the column corresponds to the number of variables.

The numbers of zero weights are as follows.

dataset2$nZeroX

The weights of Y as well as X are set as follows.

dataset2$WY
dataset2$nZeroY
dataset2$ZcoefX
dataset2$ZcoefY

Supervised sparse PLS

Here, further set Z and execute Supervised sparse PLS. The supervised parameters muX and muY are both set to 0.3 here.

(opt222 = optparasearch(X2, Y2, Z, muX=0.3, muY=0.3, 
search.method = "ncomp1st", criterion="BIC", 
criterion4ncomp="BIC", whichselect=c("X","Y")))
(fit222 = msma(X2, Y2, Z, 
muX=0.3, muY=0.3, comp=opt222$optncomp,
lambdaX=opt222$optlambdaX, lambdaY=opt222$optlambdaY))

The results of the first component are as follows.

par(mfrow=c(2,2), oma = c(0, 0, 2, 0))
plot(fit222, axes = 1, plottype="bar", 
block="super", XY="X")
plot(fit222, axes = 2, plottype="bar", 
block="super", XY="X")
plot(fit222, axes = 1, plottype="bar", 
block="block", XY="X")
plot(fit222, axes = 2, plottype="bar", 
block="block", XY="X")

The results of the first component are as follows.

par(mfrow=c(2,2), oma = c(0, 0, 2, 0))
plot(fit222, axes = 1, plottype="bar", 
block="super", XY="Y")
plot(fit222, axes = 2, plottype="bar", 
block="super", XY="Y")
plot(fit222, axes = 1, plottype="bar", 
block="block", XY="Y")
plot(fit222, axes = 2, plottype="bar", 
block="block", XY="Y")
par(mfrow=c(1,2))
for(i in 1:2) plot(fit222, axes = i, XY="XY")

The relationship between the super score and the binary outcome Z is examined to compare the presence of supervision. The result with supervision is as follows.

par(mfrow=c(2,2))
for(xy in c("X","Y")){for(i in 1:2){
ss = fit222[[paste0("ss", xy)]][,i]
t1=t.test(ss~Z)
boxplot(ss~Z, main=paste(xy, "Comp", i),
sub=paste("t-test p =", round(t1$p.value,4)))
}}