Package 'TensorClustering'

Title: Model-Based Tensor Clustering
Description: Performs model-based tensor clustering methods including Tensor Gaussian Mixture Model (TGMM), Tensor Envelope Mixture Model (TEMM) by Deng and Zhang (2021) <DOI: 10.1111/biom.13486>, Doubly-Enhanced EM (DEEM) algorithm by Mai, Zhang, Pan and Deng (2021) <DOI: 10.1080/01621459.2021.1904959>.
Authors: Kai Deng [aut, cre], Yuqing Pan [aut], Xin Zhang [aut], Qing Mai [aut]
Maintainer: Kai Deng <[email protected]>
License: MIT + file LICENSE
Version: 1.0.2
Built: 2024-12-25 06:37:49 UTC
Source: CRAN

Help Index


Doubly-enhanced EM algorithm

Description

Doubly-enhanced EM algorithm for tensor clustering

Usage

DEEM(X, nclass, niter = 100, lambda = NULL, dfmax = n, pmax = nvars, pf = rep(1, nvars),
eps = 1e-04, maxit = 1e+05, sml = 1e-06, verbose = FALSE, ceps = 0.1,
initial = TRUE, vec_x = NULL)

Arguments

X

Input tensor (or matrix) list of length nn, where nn is the number of observations. Each element of the list is a tensor or matrix. The order of tensor can be any positive integer not less than 2.

nclass

Number of clusters.

niter

Maximum iteration times for EM algorithm. Default value is 100.

lambda

A user-specified lambda value. lambda is the weight of L1 penalty and a smaller lambda allows more variables to be nonzero

dfmax

The maximum number of selected variables in the model. Default is the number of observations n.

pmax

The maximum number of potential selected variables during iteration. In middle step, the algorithm can select at most pmax variables and then shrink part of them such that the number of final selected variables is less than dfmax.

pf

Weight of lasso penalty. Default is a vector of value 1 and length p, representing L1 penalty of length pp. Can be modified to use adaptive lasso penalty.

eps

Convergence threshold for coordinate descent difference between iterations. Default value is 1e-04.

maxit

Maximum iteration times for coordinate descent for all lambda. Default value is 1e+05.

sml

Threshold for ratio of loss function change after each iteration to old loss function value. Default value is 1e-06.

verbose

Indicates whether print out lambda during iteration or not. Default value is FALSE.

ceps

Convergence threshold for cluster mean difference between iterations. Default value is 1.

initial

Whether to initialize algorithm with K-means clustering. Default value is TRUE.

vec_x

Vectorized tensor data. Default value is NULL

Details

The DEEM function implements the Doubly-Enhanced EM algorithm (DEEM) for tensor clustering. The observations Xi\mathbf{X}_i are assumed to be following the tensor normal mixture model (TNMM) with common covariances across different clusters:

Xik=1KπkTN(μk;Σ1,,ΣM),i=1,,n,\mathbf{X}_i\sim\sum_{k=1}^K\pi_k \mathrm{TN}(\bm{\mu}_k;\bm{\Sigma}_1,\ldots,\bm{\Sigma}_M),\quad i=1,\dots,n,

where 0<πk<10<\pi_k<1 is the prior probability for X\mathbf{X} to be in the kk-th cluster such that k=1Kπk=1\sum_{k=1}^{K}\pi_k=1, μk\bm{\mu}_k is the cluster mean of the kk-th cluster and Σ1,,ΣM)\bm{\Sigma}_1,\ldots,\bm{\Sigma}_M) are the common covariances across different clusters. Under the TNMM framework, the optimal clustering rule can be showed as

Y^opt=argmaxk{logπk+X(μ1+μk)/2,Bk},\widehat{Y}^{opt}=\arg\max_k\{\log\pi_k+\langle\mathbf{X}-(\bm{\mu}_1+\bm{\mu}_k)/2,\mathbf{B}_k\rangle\},

where Bk=[ ⁣[μkμ1;Σ11,,ΣM1] ⁣]\mathbf{B}_k=[\![\bm{\mu}_k-\bm{\mu}_1;\bm{\Sigma}_1^{-1},\ldots,\bm{\Sigma}_M^{-1}]\!]. In the enhanced E-step, DEEM imposes sparsity directly on the optimal clustering rule as a flexible alternative to popular low-rank assumptions on tensor coefficients Bk\mathbf{B}_k as

minB2,,BK[k=2K(Bk,[ ⁣[Bk,Σ^1(t),,Σ^M(t)] ⁣]2Bk,μ^k(t)μ^1(t))+λ(t+1)Jk=2Kbk,J2],\min_{\mathbf{B}_2,\dots,\mathbf{B}_K}\bigg[\sum_{k=2}^K(\langle\mathbf{B}_k,[\![\mathbf{B}_k,\widehat{\bm{\Sigma}}_1^{(t)},\ldots,\widehat{\bm{\Sigma}}_M^{(t)}]\!]\rangle-2\langle\mathbf{B}_k,\widehat{\bm{\mu}}_k^{(t)}-\widehat{\bm{\mu}}_1^{(t)}\rangle) +\lambda^{(t+1)}\sum_{\mathcal{J}}\sqrt{\sum_{k=2}^Kb_{k,\mathcal{J}}^2}\bigg],

where λ(t+1)\lambda^{(t+1)} is a tuning parameter. In the enhanced M-step, DEEM employs a new estimator for the tensor correlation structure, which facilitates both the computation and the theoretical studies.

Value

pi

A vector of estimated prior probabilities for clusters.

mu

A list of estimated cluster means.

sigma

A list of estimated covariance matrices.

gamma

A n by nclass matrix of estimated membership weights.

y

A vector of estimated labels.

iter

Number of iterations until convergence.

df

Average zero elements in beta over iterations.

beta

A matrix of vectorized B_k.

Author(s)

Kai Deng, Yuqing Pan, Xin Zhang and Qing Mai

References

Mai, Q., Zhang, X., Pan, Y. and Deng, K. (2021). A Doubly-Enhanced EM Algorithm for Model-Based Tensor Clustering. Journal of the American Statistical Association.

See Also

tune_lamb, tune_K

Examples

dimen = c(5,5,5)
nvars = prod(dimen)
K = 2
n = 100
sigma = array(list(),3)

sigma[[1]] = sigma[[2]] = sigma[[3]] = diag(5)

B2=array(0,dim=dimen)
B2[1:3,1,1]=2

y = c(rep(1,50),rep(2,50))
M = array(list(),K)
M[[1]] = array(0,dim=dimen)
M[[2]] = B2

vec_x=matrix(rnorm(n*prod(dimen)),ncol=n)
X=array(list(),n)
for (i in 1:n){
  X[[i]] = array(vec_x[,i],dim=dimen)
  X[[i]] = M[[y[i]]] + X[[i]]
}

myfit = DEEM(X, nclass=2, lambda=0.05)

Fit the Tensor Envelope Mixture Model (TEMM)

Description

Fit the Tensor Envelope Mixture Model (TEMM)

Usage

TEMM(Xn, u, K, initial = "kmeans", iter.max = 500, 
stop = 1e-3, trueY = NULL, print = FALSE)

Arguments

Xn

The tensor for clustering, should be array type, the last dimension is the sample size n.

u

A vector of envelope dimension

K

Number of clusters, greater than or equal to 2.

initial

Initialization meth0d for the regularized EM algorithm. Default value is "kmeans".

iter.max

Maximum number of iterations. Default value is 500.

stop

Convergence threshold of relative change in cluster means. Default value is 1e-3.

trueY

A vector of true cluster labels of each observation. Default value is NULL.

print

Whether to print information including current iteration number, relative change in cluster means and clustering error (%) in each iteration.

Details

The TEMM function fits the Tensor Envelope Mixture Model (TEMM) through a subspace-regularized EM algorithm. For mode mm, let (Γm,Γ0m)Rpm×pm(\bm{\Gamma}_m,\bm{\Gamma}_{0m})\in R^{p_m\times p_m} be an orthogonal matrix where ΓmRpm×um\bm{\Gamma}_{m}\in R^{p_{m}\times u_{m}}, umpmu_{m}\leq p_{m}, represents the material part. Specifically, the material part X,m=X×mΓmT\mathbf{X}_{\star,m}=\mathbf{X}\times_{m}\bm{\Gamma}_{m}^{T} follows a tensor normal mixture distribution, while the immaterial part X,m=X×mΓ0mT\mathbf{X}_{\circ,m}=\mathbf{X}\times_{m}\bm{\Gamma}_{0m}^{T} is unimodal, independent of the material part and hence can be eliminated without loss of clustering information. Dimension reduction is achieved by focusing on the material part X,m=X×mΓmT\mathbf{X}_{\star,m}=\mathbf{X}\times_{m}\bm{\Gamma}_{m}^{T}. Collectively, the joint reduction from each mode is

X=[ ⁣[X;Γ1T,,ΓMT] ⁣]k=1KπkTN(αk;Ω1,,ΩM),X ⁣ ⁣ ⁣X,m,\mathbf{X}_{\star}=[\![\mathbf{X};\bm{\Gamma}_{1}^{T},\dots,\bm{\Gamma}_{M}^{T}]\!]\sim\sum_{k=1}^{K}\pi_{k}\mathrm{TN}(\bm{\alpha}_{k};\bm{\Omega}_{1},\dots,\bm{\Omega}_{M}),\quad \mathbf{X}_{\star}\perp\!\!\!\perp\mathbf{X}_{\circ,m},

where αkRu1××uM\bm{\alpha}_{k}\in R^{u_{1}\times\cdots\times u_{M}} and ΩmRum×um\bm{\Omega}_m\in R^{u_m\times u_m} are the dimension-reduced clustering parameters and X,m\mathbf{X}_{\circ,m} does not vary with cluster index YY. In the E-step, the membership weights are evaluated as

η^ik(s)=π^k(s1)fk(Xi;θ^(s1))k=1Kπ^k(s1)fk(Xi;θ^(s1)),\widehat{\eta}_{ik}^{(s)}=\frac{\widehat{\pi}_{k}^{(s-1)}f_{k}(\mathbf{X}_i;\widehat{\bm{\theta}}^{(s-1)})}{\sum_{k=1}^{K}\widehat{\pi}_{k}^{(s-1)}f_{k}(\mathbf{X}_i;\widehat{\bm{\theta}}^{(s-1)})},

where fkf_k denotes the conditional probability density function of Xi\mathbf{X}_i within the kk-th cluster. In the subspace-regularized M-step, the envelope subspace is iteratively estimated through a Grassmann manifold optimization that minimize the following log-likelihood-based objective function:

Gm(s)(Γm)=logΓmTMm(s)Γm+logΓmT(Nm(s))1Γm,G_m^{(s)}(\bm{\Gamma}_m) = \log|\bm{\Gamma}_m^T \mathbf{M}_m^{(s)} \bm{\Gamma}_m|+\log|\bm{\Gamma}_m^T (\mathbf{N}_m^{(s)})^{-1} \bm{\Gamma}_m|,

where Mm(s)\mathbf{M}_{m}^{(s)} and Nm(s)\mathbf{N}_{m}^{(s)} are given by

Mm(s)=1npmi=1nk=1Kη^ik(s)(ϵik(s))(m)(Σ^m(s1))1(ϵik(s))(m)T,\mathbf{M}_m^{(s)} = \frac{1}{np_{-m}}\sum_{i=1}^{n} \sum_{k=1}^{K}\widehat{\eta}_{ik}^{(s)} (\bm{\epsilon}_{ik}^{(s)})_{(m)}(\widehat{\bm{\Sigma}}_{-m}^{(s-1)})^{-1} (\bm{\epsilon}_{ik}^{(s)})_{(m)}^T,

Nm(s)=1npmi=1n(Xi)(m)(Σ^m(s1))1(Xi)(m)T.\mathbf{N}_m^{(s)} = \frac{1}{np_{-m}}\sum_{i=1}^{n} (\mathbf{X}_i)_{(m)}(\widehat{\bm{\Sigma}}_{-m}^{(s-1)})^{-1}(\mathbf{X}_i)_{(m)}^T.

The intermediate estimators Mm(s)\mathbf{M}_{m}^{(s)} can be viewed the mode-mm conditional variation estimate of XY\mathbf{X}\mid Y and Nm(s)\mathbf{N}_{m}^{(s)} is the mode-mm marginal variation estimate of X\mathbf{X}.

Value

id

A vector of estimated labels.

pi

A vector of estimated prior probabilities for clusters.

eta

A n by K matrix of estimated membership weights.

Mu.est

A list of estimated cluster means.

SIG.est

A list of estimated covariance matrices.

Mm

Estimation of Mm defined in paper.

Nm

Estimation of Nm defined in paper.

Gamma.est

A list of estimated envelope basis.

PGamma.est

A list of envelope projection matrices.

Author(s)

Kai Deng, Yuqing Pan, Xin Zhang and Qing Mai

References

Deng, K. and Zhang, X. (2021). Tensor Envelope Mixture Model for Simultaneous Clustering and Multiway Dimension Reduction. Biometrics.

See Also

TGMM, tune_u_sep, tune_u_joint

Examples

A = array(c(rep(1,20),rep(2,20))+rnorm(40),dim=c(2,2,10))
  myfit = TEMM(A,u=c(2,2),K=2)

Fit the Tensor Gaussian Mixture Model (TGMM)

Description

Fit the Tensor Gaussian Mixture Model (TGMM)

Usage

TGMM(Xn, K, shape = "shared", initial = "kmeans", 
iter.max = 500, stop = 1e-3, trueY = NULL, print = FALSE)

Arguments

Xn

The tensor for clustering, should be array type, the last dimension is the sample size n.

K

Number of clusters, greater than or equal to 2.

shape

"shared" if assume common covariance across mixtures, "distinct" if allow different covariance structures. Default value is "shared".

initial

Initialization meth0d for the regularized EM algorithm. Default value is "kmeans".

iter.max

Maximum number of iterations. Default value is 500.

stop

Convergence threshold of relative change in cluster means. Default value is 1e-3.

trueY

A vector of true cluster labels of each observation. Default value is NULL.

print

Whether to print information including current iteration number, relative change in cluster means and clustering error (%) in each iteration.

Details

The TGMM function fits the Tensor Gaussian Mixture Model (TGMM) through the classical EM algorithm. TGMM assumes the following tensor normal mixture distribution of M-way tensor data X\mathbf{X}:

Xk=1KπkTN(μk,Mk),i=1,,n,\mathbf{X}\sim\sum_{k=1}^K\pi_k \mathrm{TN}(\bm{\mu}_k,\mathcal{M}_k),\quad i=1,\dots,n,

where 0<πk<10<\pi_k<1 is the prior probability for X\mathbf{X} to be in the kk-th cluster such that k=1Kπk=1\sum_{k=1}^{K}\pi_k=1, μk\bm{\mu}_k is the mean of the kk-th cluster, Mk{Σkm,m=1,,M}\mathcal{M}_k \equiv \{\bm{\Sigma}_{km}, m=1,\dots,M\} is the set of covariances of the kk-th cluster. If Mk\mathcal{M}_k's are the same for k=1,,Kk=1,\dots,K, call TGMM with argument shape="shared".

Value

id

A vector of estimated labels.

pi

A vector of estimated prior probabilities for clusters.

eta

A n by K matrix of estimated membership weights.

Mu.est

A list of estimated cluster means.

SIG.est

A list of estimated covariance matrices.

Author(s)

Kai Deng, Yuqing Pan, Xin Zhang and Qing Mai

References

Deng, K. and Zhang, X. (2021). Tensor Envelope Mixture Model for Simultaneous Clustering and Multiway Dimension Reduction. Biometrics.

Tait, P. A. and McNicholas, P. D. (2019). Clustering higher order data: Finite mixtures of multidimensional arrays. arXiv:1907.08566.

See Also

TEMM

Examples

A = array(c(rep(1,20),rep(2,20))+rnorm(40),dim=c(2,2,10))
  myfit = TGMM(A,K=2,shape="shared")

Select the number of clusters K in DEEM

Description

Select the number of clusters K along with tuning parameter lambda through BIC in DEEM.

Usage

tune_K(X, seqK, seqlamb, initial = TRUE, vec_x = NULL)

Arguments

X

Input tensor (or matrix) list of length nn, where nn is the number of observations. Each element of the list is a tensor or matrix. The order of tensor can be any positive integer not less than 2.

seqK

A sequence of user-specified number of clusters.

seqlamb

A sequence of user-specified lambda values. lambda is the weight of L1 penalty and a smaller lambda allows more variables to be nonzero

initial

Whether to initialize algorithm with K-means clustering. Default value is TRUE.

vec_x

Vectorized tensor data. Default value is NULL

Details

The tune_K function runs tune_lamb function length(seqK) times to choose the tuning parameter λ\lambda and number of clusters KK simultaneously. Let θ^{λ,K}\widehat{\bm{\theta}}^{\{\lambda,K\}} be the output of DEEM with the tuning parameter and number of clusters fixed at λ\lambda and KK respectively, tune_K looks for the values of λ\lambda and KK that minimizes

BIC(λ,K)=2i=1nlog(k=1Kπ^k{λ,K}fk(Xi;θ^k{λ,K}))+log(n)D^{λ,K},\mathrm{BIC}(\lambda,K)=-2\sum_{i=1}^n\log(\sum_{k=1}^K\widehat{\pi}^{\{\lambda,K\}}_kf_k(\mathbf{X}_i;\widehat{\bm{\theta}}_k^{\{\lambda,K\}}))+\log(n)\cdot |\widehat{\mathcal{D}}^{\{\lambda,K\}}|,

where D^{λ,K}={(k,J):b^k,Jλ0}\widehat{\mathcal{D}}^{\{\lambda,K\}}=\{(k, {\mathcal{J}}): \widehat b_{k,{\mathcal{J}}}^{\lambda} \neq 0 \} is the set of nonzero elements in B^2{λ,K},,B^K{λ,K}\widehat{\bm{B}}_2^{\{\lambda,K\}},\ldots,\widehat{\bm{B}}_K^{\{\lambda,K\}}. The tune_K function intrinsically selects the initial point and return the optimal estimated labels.

Value

opt_K

Selected number of clusters that leads to optimal BIC.

opt_lamb

Tuned lambda that leads to optimal BIC.

Krank

A selection summary.

Author(s)

Kai Deng, Yuqing Pan, Xin Zhang and Qing Mai

References

Mai, Q., Zhang, X., Pan, Y. and Deng, K. (2021). A Doubly-Enhanced EM Algorithm for Model-Based Tensor Clustering. Journal of the American Statistical Association.

See Also

DEEM, tune_lamb

Examples

dimen = c(5,5,5)
nvars = prod(dimen)
K = 2
n = 100
sigma = array(list(),3)

sigma[[1]] = sigma[[2]] = sigma[[3]] = diag(5)

B2=array(0,dim=dimen)
B2[1:3,1,1]=2

y = c(rep(1,50),rep(2,50))
M = array(list(),K)
M[[1]] = array(0,dim=dimen)
M[[2]] = B2

vec_x=matrix(rnorm(n*prod(dimen)),ncol=n)
X=array(list(),n)
for (i in 1:n){
  X[[i]] = array(vec_x[,i],dim=dimen)
  X[[i]] = M[[y[i]]] + X[[i]]
}

mytune = tune_K(X, seqK=2:4, seqlamb=seq(0.01,0.1,by=0.01))

Parameter tuning in enhanced E-step in DEEM

Description

Perform parameter tuning through BIC in DEEM.

Usage

tune_lamb(X, K, seqlamb, initial = TRUE, vec_x = NULL)

Arguments

X

Input tensor (or matrix) list of length nn, where nn is the number of observations. Each element of the list is a tensor or matrix. The order of tensor can be any positive integer not less than 2.

K

Number of clusters.

seqlamb

A sequence of user-specified lambda values. lambda is the weight of L1 penalty and a smaller lambda allows more variables to be nonzero

initial

Whether to initialize algorithm with K-means clustering. Default value is TRUE.

vec_x

Vectorized tensor data. Default value is NULL

Details

The tune_lamb function adopts a BIC-type criterion to select the tuning parameter λ\lambda in the enhanced E-step. Let θ^λ\widehat{\bm{\theta}}^{\lambda} be the output of DEEM with the tuning parameter fixed at λ\lambda, tune_lamb looks for the value of λ\lambda that minimizes

BIC(λ)=2i=1nlog(k=1Kπ^kλfk(Xi;θ^kλ))+log(n)D^λ,\mathrm{BIC}(\lambda)=-2\sum_{i=1}^n\log(\sum_{k=1}^K\widehat{\pi}^{\lambda}_kf_k(\mathbf{X}_i;\widehat{\bm{\theta}}_k^{\lambda}))+\log(n)\cdot |\widehat{\mathcal{D}}^{\lambda}|,

where D^λ={(k,J):b^k,Jλ0}\widehat{\mathcal{D}}^{\lambda}=\{(k, {\mathcal{J}}): \widehat b_{k,{\mathcal{J}}}^{\lambda} \neq 0 \} is the set of nonzero elements in B^2λ,,B^Kλ\widehat{\bm{B}}_2^{\lambda},\ldots,\widehat{\bm{B}}_K^{\lambda}. The tune_lamb function intrinsically selects the initial point and return the optimal estimated labels.

Value

opt_lamb

Tuned lambda that leads to optimal BIC.

opt_bic

BIC value.

opt_y

Estimated labels fitted by DEEM with tuned lambda.

Author(s)

Kai Deng, Yuqing Pan, Xin Zhang and Qing Mai

References

Mai, Q., Zhang, X., Pan, Y. and Deng, K. (2021). A Doubly-Enhanced EM Algorithm for Model-Based Tensor Clustering. Journal of the American Statistical Association.

See Also

DEEM, tune_K

Examples

dimen = c(5,5,5)
nvars = prod(dimen)
K = 2
n = 100
sigma = array(list(),3)

sigma[[1]] = sigma[[2]] = sigma[[3]] = diag(5)

B2=array(0,dim=dimen)
B2[1:3,1,1]=2

y = c(rep(1,50),rep(2,50))
M = array(list(),K)
M[[1]] = array(0,dim=dimen)
M[[2]] = B2

vec_x=matrix(rnorm(n*prod(dimen)),ncol=n)
X=array(list(),n)
for (i in 1:n){
  X[[i]] = array(vec_x[,i],dim=dimen)
  X[[i]] = M[[y[i]]] + X[[i]]
}

mytune = tune_lamb(X, K=2, seqlamb=seq(0.01,0.1,by=0.01))

Tuning envelope dimension jointly by BIC in TEMM.

Description

Tuning envelope dimension jointly by BIC in TEMM.

Usage

tune_u_joint(u_candi, K, X, iter.max = 500, stop = 0.001, trueY = NULL)

Arguments

u_candi

A list of length M containing candidate envelope dimension for each mode.

K

Number of clusters, greater than or equal to 2.

X

The tensor for clustering, should be array type, the last dimension is the sample size n.

iter.max

Maximum number of iterations. Default value is 500.

stop

Convergence threshold of relative change in cluster means. Default value is 1e-3.

trueY

A vector of true cluster labels of each observation. Default value is NULL.

Details

The tune_u_joint function searches over all the combinations of u(u1,,uM)u\equiv(u_1,\dots,u_M) in the neighborhood of u~\widetilde{u}, N(u~)={u: max(1,u~m2)ummin(u~m+2,pm), m=1,,M}\mathcal{N}(\widetilde u)=\{u:\ \max(1,\widetilde u_m-2) \leq u_m \leq \min(\widetilde u_m+2,p_m),\ m=1,\dots,M\}, that minimizes

BIC(u)=2i=1nlog(k=1Kπ^kufk(Xi;θ^u))+log(n)Ku.\mathrm{BIC}(u) = -2\sum_{i=1}^{n}\log(\sum_{k=1}^{K}\widehat{\pi}_k^u f_k(\mathbf{X}_i;\widehat{\bm{\theta}}^u)) + \log(n)\cdot K_u.

In the above BIC, Ku=(K1)m=1Mum+m=1Mpm(pm+1)/2K_u=(K-1)\prod_{m=1}^M u_m + \sum_{m=1}^{M}p_m(p_m+1)/2 is the total number of parameters in TEMM, π^ku\widehat{\pi}_k^u and θ^u\widehat{\bm{\theta}}^{u} are the estimated parameters with envelope dimension fixed at uu. The tune_u_joint function intrinsically selects the initial point and return the optimal estimated labels.

Value

opt.u

Optimal envelope dimension selected.

opt.id

Estimated labels fitted by TEMM with the optimal envelope dimension.

opt.Mu

Estimated cluster means fitted by TEMM with the optimal envelope dimension.

bic

BIC value.

Author(s)

Kai Deng, Yuqing Pan, Xin Zhang and Qing Mai

References

Deng, K. and Zhang, X. (2021). Tensor Envelope Mixture Model for Simultaneous Clustering and Multiway Dimension Reduction. Biometrics.

See Also

TEMM, tune_u_sep

Examples

A = array(c(rep(1,20),rep(2,20))+rnorm(40),dim=c(2,2,10))
  mytune = tune_u_joint(u_candi=list(1:2,1:2),K=2,A)

Tuning envelope dimension separately by BIC in TEMM.

Description

Tuning envelope dimension separately by BIC in TEMM.

Usage

tune_u_sep(m, u_candi, K, X, C = 1, oneD = TRUE, 
iter.max = 500, stop = 0.001, trueY = NULL)

Arguments

m

The tensor mode to be tuned, can take value in 1,...,M.

u_candi

A vector of candidate envelope dimension.

K

Number of clusters, greater than or equal to 2.

X

The tensor for clustering, should be array type, the last dimension is the sample size n.

C

Constant in separate BIC criterion. Default value is 1.

oneD

Whether to apply 1D-BIC tuning. Default value is TRUE.

iter.max

Maximum number of iterations. Default value is 500.

stop

Convergence threshold of relative change in cluster means. Default value is 1e-3.

trueY

A vector of true cluster labels of each observation. Default value is NULL.

Details

For tensor mode m=1,,Mm=1,\dots,M, the tune_u_sep function selects the envelope dimension u~m\widetilde{u}_m by minimizing the following BIC-type criterion over the set {0,1,,pm}\{0,1,\dots,p_m\},

BICm(um)=logΓmTM^mΓm+logΓmTN^m1Γm+Cumlog(n)/n.\mathrm{BIC}_m(u_m) = \log|\bm{\Gamma}_m^T \widehat{\mathbf{M}}_m \bm{\Gamma}_m|+\log|\bm{\Gamma}_{m}^T \widehat{\mathbf{N}}_m^{-1} \bm{\Gamma}_{m}| + C \cdot u_m \log(n)/n.

This separate selection over each mode mm is less sensitive to the complex interrelationships of each mode of the tensor. The default constant CC is set as 11 as suggested by Zhang and Mai (2018).

Value

opt.u

Optimal envelope dimension selected.

bic

BIC value.

Author(s)

Kai Deng, Yuqing Pan, Xin Zhang and Qing Mai

References

Deng, K. and Zhang, X. (2021). Tensor Envelope Mixture Model for Simultaneous Clustering and Multiway Dimension Reduction. Biometrics.

Zhang, X. and Mai, Q. (2018). Model-free envelope dimension selection. Electronic Journal of Statistics 12, 2193-2216.

See Also

TEMM, tune_u_joint

Examples

A = array(c(rep(1,20),rep(2,20))+rnorm(40),dim=c(2,2,10))
  mytune = tune_u_sep(1,1:2,K=2,A)