Package 'ClusPred' reference manual

Title:	Simultaneous Semi-Parametric Estimation of Clustering and Regression
Description:	Parameter estimation of regression models with fixed group effects, when the group variable is missing while group-related variables are available. Parametric and semi-parametric approaches described in Marbac et al. (2020) <arXiv:2012.14159> are implemented.
Authors:	Matthieu Marbac [aut, cre, cph], Mohammed Sedki [aut], Christophe Biernacki [aut], Vincent Vandewalle [aut]
Maintainer:	Matthieu Marbac <[email protected]>
License:	GPL (>= 2)
Version:	1.1.0
Built:	2025-02-19 06:43:00 UTC
Source:	CRAN

ClusPred.

Description

Parameter estimation of regression models with fixed group effects, when the group variable is missing while group-related variables are available.

Details

Package:	ClusPred
Type:	Package
Version:	1.1.0
Date:	2021-12-01
License:	GPL-3
LazyLoad:	yes

References

Simultaneous semi-parametric estimation of clustering and regression, Matthieu Marbac and Mohammed Sedki and Christophe Biernacki and Vincent Vandewalle (2020) <arXiv:2012.14159>.

Function used for clustering and fitting the regression model

Description

Estimation of the group-variable Z based on covariates X and estimation of the parameters of the regression of Y on (U, Z)

Usage

cluspred(
  y,
  x,
  u = NULL,
  K = 2,
  model.reg = "mean",
  tau = 0.5,
  simultaneous = TRUE,
  np = TRUE,
  nbinit = 20,
  nbCPU = 1,
  tol = 0.01,
  band = (length(y)^(-1/5)),
  seed = 134
)
cluspred(
  y,
  x,
  u = NULL,
  K = 2,
  model.reg = "mean",
  tau = 0.5,
  simultaneous = TRUE,
  np = TRUE,
  nbinit = 20,
  nbCPU = 1,
  tol = 0.01,
  band = (length(y)^(-1/5)),
  seed = 134
)

Arguments

`y`	numeric vector of the traget variable (must be numerical)
`x`	matrix used for clustering (can contain numerical and factors)
`u`	matrix of the covariates used for regression (can contain numerical and factors)
`K`	number of clusters
`model.reg`	indicates the type of the loss ("mean", "quantile", "expectile", "logcosh", "huber"). Only the losses "mean" and "quantile" are implemented if simultaneous=FALSE or np=FALSE
`tau`	specifies the level for the loss (quantile, expectile or huber)
`simultaneous`	oolean indicating whether the clustering and the regression are performed simultaneously (TRUE) or not (FALSE)
`np`	boolean indicating whether nonparameteric model is used (TRUE) or not (FALSE)
`nbinit`	number of random initializations
`nbCPU`	number of CPU only used for linux
`tol`	to specify the stopping rule
`band`	bandwidth selection
`seed`	value of the seed (used for drawing the starting points)

Value

cluspred returns a list containing the model parameters (param), the posterior probabilities of cluster memberships (tik), the partition (zhat) and the (smoothed) loglikelihood)

References

Simultaneous semi-parametric estimation of clustering and regression, Matthieu Marbac and Mohammed Sedki and Christophe Biernacki and Vincent Vandewalle (2020) <arXiv:2012.14159>.

Examples

require(ClusPred)
# data loading
data(simdata)

# mean regression with two latent groups in parametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
 np=FALSE, nbCPU = 1, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# loglikelihood
res$loglike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = FALSE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat


# median regression with two latent groups in nonparametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
model.reg = "quantile", tau = 0.5, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# smoothed loglikelihood
res$logSmoothlike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = TRUE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat



require(ClusPred)
# data loading
data(simdata)

# mean regression with two latent groups in parametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
 np=FALSE, nbCPU = 1, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# loglikelihood
res$loglike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = FALSE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat


# median regression with two latent groups in nonparametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
model.reg = "quantile", tau = 0.5, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# smoothed loglikelihood
res$logSmoothlike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = TRUE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat

Prediction (clustering and target variable)

Description

Prediction for new observations

Usage

predictboth(x, u = NULL, result, np = FALSE)
predictboth(x, u = NULL, result, np = FALSE)

Arguments

`x`	covariates used for clustering
`u`	covariates of the regression (can be null)
`result`	results provided by function cluspred
`np`	boolean indicating whether nonparametric estimation is used (TRUE) or not (FALSE)

Value

predictboth returns a list containing the predicted cluster membership (zhat) and the predicted value of the target variable (yhat).

Examples

require(ClusPred)
# data loading
data(simdata)

# mean regression with two latent groups in parametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
np=FALSE, nbCPU = 1, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# loglikelihood
res$loglike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = FALSE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat


# median regression with two latent groups in nonparametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
 model.reg = "quantile", tau = 0.5, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# smoothed loglikelihood
res$logSmoothlike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = TRUE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat



require(ClusPred)
# data loading
data(simdata)

# mean regression with two latent groups in parametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
np=FALSE, nbCPU = 1, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# loglikelihood
res$loglike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = FALSE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat


# median regression with two latent groups in nonparametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
 model.reg = "quantile", tau = 0.5, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# smoothed loglikelihood
res$logSmoothlike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = TRUE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat

Simulated data

Description

simulated data used for the pacakge examples.

Examples

data(simdata)
data(simdata)

Package 'ClusPred'

Help Index

ClusPred.

Description

Details

References

Function used for clustering and fitting the regression model

Description

Usage

Arguments

Value

References

Examples

Prediction (clustering and target variable)

Description

Usage

Arguments

Value

Examples

Simulated data

Description

Examples