Package 'cluscov' reference manual

Title:	Clustered Covariate Regression
Description:	Clustered covariate regression enables estimation and inference in both linear and non-linear models with linear predictor functions even when the design matrix is column rank deficient. Routines in this package implement algorithms in Soale and Tsyawo (2019) <doi:10.13140/RG.2.2.32355.81441>.
Authors:	Emmanuel S Tsyawo [aut, cre], Abdul-Nasah Soale [aut]
Maintainer:	Emmanuel S Tsyawo <[email protected]>
License:	GPL-2
Version:	1.1.0
Built:	2025-02-18 07:29:20 UTC
Source:	CRAN

Concrete class constructor

Description

A function for constructing functions for concrete classes of models for the chmod() family of of functions.

Usage

c_chmod(Y, X, modclass = "lm")
c_chmod(Y, X, modclass = "lm")

Arguments

`Y`	vector of the outcome variable
`X`	matrix of covariates; excepting intercepts 1's
`modclass`	the class of model. Currently, "lm" for linear regression, "logit" (logit model), "qreg" (quantile regression), "probit" (probit model), "gammainverse" (gamma with inverse link), "gammalog" (gamma with log link), "poissonlog" (poisson model with log link), "poissonidentity" (poisson with identity link), "poissonsqrt" (poisson with sqrt link), "negbin" (negative binomial) are supported.

Value

object an object list with class attribute modclass.

Sequential CCR

Description

CCRls runs regressions with potentially more covariates than observations. See c_chmod() for the list of models supported.

Usage

CCRls(Y, X, kap = 0.1, modclass = "lm", tol = 1e-06, reltol = TRUE,
  rndcov = NULL, report = NULL, ...)
CCRls(Y, X, kap = 0.1, modclass = "lm", tol = 1e-06, reltol = TRUE,
  rndcov = NULL, report = NULL, ...)

Arguments

`Y`	vector of dependent variable Y
`X`	design matrix (without intercept)
`kap`	maximum number of parameters to estimate in each active sequential step, as a fraction of the less of total number of observations n or number of covariates p. i.e. $min(n,p)$
`modclass`	a string denoting the desired the class of model. See c_chmod for details.
`tol`	level of tolerance for convergence; default `tol=1e-6`
`reltol`	a logical for relative tolerance instead of level. Defaults at TRUE
`rndcov`	seed for randomising assignment of covariates to partitions; default `NULL`
`report`	number of iterations after which to report progress; default `NULL`
`...`	additional arguments to be passed to the model

Value

betas parameter estimates (intercept first),

iter number of iterations,

dev increment in the objective function value at convergence

fval objective function value at convergence

Examples

set.seed(14) #Generate data
N = 1000; (bets = rep(-2:2,4)); p = length(bets); X = matrix(rnorm(N*p),N,p)
Y = cbind(1,X)%*%matrix(c(0.5,bets),ncol = 1)
CCRls(Y,X,kap=0.1,modclass="lm",tol=1e-6,reltol=TRUE,rndcov=NULL,report=8)
set.seed(14) #Generate data
N = 1000; (bets = rep(-2:2,4)); p = length(bets); X = matrix(rnorm(N*p),N,p)
Y = cbind(1,X)%*%matrix(c(0.5,bets),ncol = 1)
CCRls(Y,X,kap=0.1,modclass="lm",tol=1e-6,reltol=TRUE,rndcov=NULL,report=8)

Linear regression via coordinate descent with covariate clustering

Description

This function is a wrapper for linrclus. It requires less input.

Usage

CCRls.coord(Y, X, k, nC = 1, ...)
CCRls.coord(Y, X, k, nC = 1, ...)

Arguments

`Y`	vector of outcome variable
`X`	matrix of covariates. Should not include 1's for the intercept
`k`	number of clusters
`nC`	first nC-1 covariates in X not to cluster. Must be at least 1 for the intercept
`...`	additional parameters to be passed to lm

Value

mobj the low dimension lm regression object

clus cluster assignments of covariates (excluding the first nC covariates - including the intercept 1)

Examples

set.seed(14) #Generate data
N = 1000; (bets = rep(-2:2,4)); p = length(bets); X = matrix(rnorm(N*p),N,p)
Y = cbind(1,X)%*%matrix(c(0.5,bets),ncol = 1)
CCRls.coord(Y,X,k=5,nC=1)
set.seed(14) #Generate data
N = 1000; (bets = rep(-2:2,4)); p = length(bets); X = matrix(rnorm(N*p),N,p)
Y = cbind(1,X)%*%matrix(c(0.5,bets),ncol = 1)
CCRls.coord(Y,X,k=5,nC=1)

Sequential CCR with k clusters

Description

CCRseqk runs regressions with potentially more covariates than observations with k clusters. See c_chmod() for the list of models supported.

Usage

CCRseqk(Y, X, k, nC = 1, kap = 0.1, modclass = "lm", tol = 1e-06,
  reltol = TRUE, rndcov = NULL, report = NULL, ...)
CCRseqk(Y, X, k, nC = 1, kap = 0.1, modclass = "lm", tol = 1e-06,
  reltol = TRUE, rndcov = NULL, report = NULL, ...)

Arguments

`Y`	vector of dependent variable Y
`X`	design matrix (without intercept)
`k`	number of clusters
`nC`	first `nC-1` columns in `X` not to cluster
`kap`	maximum number of parameters to estimate in each active sequential step, as a fraction of the less of total number of observations n or number of covariates p. i.e. $min(n,p)$
`modclass`	a string denoting the desired the class of model. See c_chmod for details.
`tol`	level of tolerance for convergence; default `tol=1e-6`
`reltol`	a logical for relative tolerance instead of level. Defaults at TRUE
`rndcov`	seed for randomising assignment of covariates to partitions; default `NULL`
`report`	number of iterations after which to report progress; default `NULL`
`...`	additional arguments to be passed to the model

Value

a list of objects

mobj low dimensional model object of class lm, glm, or rq (depending on modclass)
clus cluster assignments of covariates
iter number of iterations
dev decrease in the function value at convergence

Examples

set.seed(14) #Generate data
N = 1000; (bets = rep(-2:2,4)/2); p = length(bets); X = matrix(rnorm(N*p),N,p)
Y = cbind(1,X)%*%matrix(c(0.5,bets),ncol = 1); nC=1
zg=CCRseqk(Y,X,k=5,nC=nC,kap=0.1,modclass="lm",tol=1e-6,reltol=TRUE,rndcov=NULL,report=8)
(del=zg$mobj$coefficients) # delta
(bets = c(del[1:nC],(del[-c(1:nC)])[zg$clus])) #construct beta
set.seed(14) #Generate data
N = 1000; (bets = rep(-2:2,4)/2); p = length(bets); X = matrix(rnorm(N*p),N,p)
Y = cbind(1,X)%*%matrix(c(0.5,bets),ncol = 1); nC=1
zg=CCRseqk(Y,X,k=5,nC=nC,kap=0.1,modclass="lm",tol=1e-6,reltol=TRUE,rndcov=NULL,report=8)
(del=zg$mobj$coefficients) # delta
(bets = c(del[1:nC],(del[-c(1:nC)])[zg$clus])) #construct beta

Model criterion function

Description

A generic S3 function as wrapper for internal R routines for classes of models implemented in this package. See details c_chmod for the list of classes supported.

Usage

chmod(object, ...)
chmod(object, ...)

Arguments

`object`	the object to be passed to the concrete class constructor `chmod`
`...`	additional paramters to be passed to the internal routine

Regression - gammainverse class

Description

A gamma regression implementation for the "gammainverse" class. It uses glm with the Gamma link function set to "inverse"

Usage

## S3 method for class 'gammainverse'
chmod(object, ...)
## S3 method for class 'gammainverse'
chmod(object, ...)

Arguments

`object`	a list of Y - outcome variable and X - design matrix of class "probit"
`...`	additional parameters to be passed to `glm`

Value

fitted model object

Examples

chmod(c_chmod(Y=women$height,X=women$weight,modclass="gammainverse"))
chmod(c_chmod(Y=women$height,X=women$weight,modclass="gammainverse"))

Regression - gammalog class

Description

A gamma regression implementation for the "gammalog" class. It uses glm with the Gamma link function set to "log"

Usage

## S3 method for class 'gammalog'
chmod(object, ...)
## S3 method for class 'gammalog'
chmod(object, ...)

Arguments

`object`	a list of Y - outcome variable and X - design matrix of class "probit"
`...`	additional parameters to be passed to `glm`

Value

fitted model object

Examples

chmod(c_chmod(Y=women$height,X=women$weight,modclass="gammalog"))
chmod(c_chmod(Y=women$height,X=women$weight,modclass="gammalog"))

Regression - lm class

Description

A linear regression implementation for the "lm" class. It uses lm

Usage

## S3 method for class 'lm'
chmod(object, ...)
## S3 method for class 'lm'
chmod(object, ...)

Arguments

`object`	a list of Y - outcome variable and X - design matrix of class "lm"
`...`	additional parameters to be passed to `lm`

Value

fitted model object

Examples

chmod(c_chmod(Y=women$height,X=women$weight,modclass="lm"))
chmod(c_chmod(Y=women$height,X=women$weight,modclass="lm"))

Regression - logit class

Description

A logit regression implementation for the "logit" class. It uses glm with the binomial link function set to "logit"

Usage

## S3 method for class 'logit'
chmod(object, ...)
## S3 method for class 'logit'
chmod(object, ...)

Arguments

`object`	a list of Y - outcome variable and X - design matrix of class "logit"
`...`	additional parameters to be passed to `glm`

Value

fitted model object

Examples

chmod(c_chmod(Y=women$height<=50,X=women$weight,modclass="logit"))
chmod(c_chmod(Y=women$height<=50,X=women$weight,modclass="logit"))

Regression - negbin class

Description

A negative binomial regression implementation for the "negbin" class. It uses glm.nb

Usage

## S3 method for class 'negbin'
chmod(object, ...)
## S3 method for class 'negbin'
chmod(object, ...)

Arguments

`object`	a list of Y - outcome variable and X - design matrix of class "negbin"
`...`	additional parameters to be passed to `glm.nb`

Value

fitted model object

Regression - poissonidentity class

Description

A poisson regression implementation for the "poissonidentity" class. It uses glm with the poisson link function set to "identity"

Usage

## S3 method for class 'poissonidentity'
chmod(object, ...)
## S3 method for class 'poissonidentity'
chmod(object, ...)

Arguments

`object`	a list of Y - outcome variable and X - design matrix of class "poissonidentity"
`...`	additional parameters to be passed to `glm`

Value

fitted model object

Examples

chmod(c_chmod(Y=women$height,X=women$weight,modclass="poissonidentity"))
chmod(c_chmod(Y=women$height,X=women$weight,modclass="poissonidentity"))

Regression - poissonlog class

Description

A poisson regression implementation for the "poissonlog" class. It uses glm with the poisson link function set to "log"

Usage

## S3 method for class 'poissonlog'
chmod(object, ...)
## S3 method for class 'poissonlog'
chmod(object, ...)

Arguments

`object`	a list of Y - outcome variable and X - design matrix of class "poissonlog"
`...`	additional parameters to be passed to `glm`

Value

fitted model object

Examples

chmod(c_chmod(Y=women$height,X=women$weight,modclass="poissonlog"))
chmod(c_chmod(Y=women$height,X=women$weight,modclass="poissonlog"))

Regression - poissonsqrt class

Description

A poisson regression implementation for the "poissonsqrt" class. It uses glm with the poisson link function set to "sqrt"

Usage

## S3 method for class 'poissonsqrt'
chmod(object, ...)
## S3 method for class 'poissonsqrt'
chmod(object, ...)

Arguments

`object`	a list of Y - outcome variable and X - design matrix of class "poissonsqrt"
`...`	additional parameters to be passed to `glm`

Value

fitted model object

Examples

chmod(c_chmod(Y=women$height,X=women$weight,modclass="poissonsqrt"))
chmod(c_chmod(Y=women$height,X=women$weight,modclass="poissonsqrt"))

Regression - probit class

Description

A probit regression implementation for the "probit" class. It uses glm with the binomial link set to "probit"

Usage

## S3 method for class 'probit'
chmod(object, ...)
## S3 method for class 'probit'
chmod(object, ...)

Arguments

`object`	a list of Y - outcome variable and X - design matrix of class "probit"
`...`	additional parameters to be passed to `glm`

Value

fitted model object

Examples

chmod(c_chmod(Y=women$height<=50,X=women$weight,modclass="probit"))
chmod(c_chmod(Y=women$height<=50,X=women$weight,modclass="probit"))

Regression - qreg class

Description

A quantile regression implementation for the "qreg" class. It uses rq

Usage

## S3 method for class 'qreg'
chmod(object, ...)
## S3 method for class 'qreg'
chmod(object, ...)

Arguments

`object`	a list of Y - outcome variable and X - design matrix of class "qreg"
`...`	additional parameters to be passed to `rq`, for example `tau`

Value

fitted model object

Examples

chmod(c_chmod(Y=women$height,X=women$weight,modclass="qreg"),tau=0.45)
chmod(c_chmod(Y=women$height,X=women$weight,modclass="qreg"),tau=0.45)

Clustering of vector elements

Description

A deterministic clustering device of vector elements into k clusters

Usage

dcluspar(k, vec)
dcluspar(k, vec)

Arguments

`k`	number of clusters
`vec`	the vector of real valued elements

Value

clus integer assignment of corresponding elements in vec in up to k clusters

Examples

set.seed(2); (v=c(rnorm(4,0,0.5),rnorm(3,3,0.5))[sample(1:7)])
dcluspar(k=2,vec = v)
set.seed(2); (v=c(rnorm(4,0,0.5),rnorm(3,3,0.5))[sample(1:7)])
dcluspar(k=2,vec = v)

Golden Section Search Algorithm

Description

Minimising a continuous univariate function using the golden section search algorithm.

Usage

goldensearch(fn, interval, tol = 1)
goldensearch(fn, interval, tol = 1)

Arguments

`fn`	the function; should be scalar valued
`interval`	a vector containing the lower and upper bounds of search
`tol`	tolerance level for convergence

Value

a list of objects

k: minimiser
value: mimimum value
iter: number of iterations before convergence
iterfn: number of function evaluations

Examples

fn = function(x) (x-1)^2; goldensearch(fn=fn,interval=c(-2,3),tol=1)
fn = function(x) (x-1)^2; goldensearch(fn=fn,interval=c(-2,3),tol=1)

Integer Golden Search Minimisation

Description

This function conducts an integer golden search minimisation of a univariate function.

Usage

goldopt(fn, interval, tol = 1)
goldopt(fn, interval, tol = 1)

Arguments

`fn`	function to be minimised. fn should return a list, with fval as the function value.
`interval`	a vector of length two containing the minimum and maximum interger values within which to search for the minimiser.
`tol`	the tolerance level. Defaults at 1

Value

k minimiser of fn()

crit the minimum

iter total number of iterations

iterfn total number of function evaluations of fn()

fobj an object of the function minimisation

key a logical for warning if fobj may not correspond to k

Examples

set.seed(14) #Generate data
N = 1000; (bets = rep(-2:2,4)); p = length(bets); X = matrix(rnorm(N*p),N,p)
Y = cbind(1,X)%*%matrix(c(0.5,bets),ncol = 1)
fn=function(k){du=CCRls.coord(Y,X,k=k,nC=1)
return(list(fval=BIC(du$mobj),obj=du))}
goldopt(fn=fn,interval=c(2,7),tol=1)
set.seed(14) #Generate data
N = 1000; (bets = rep(-2:2,4)); p = length(bets); X = matrix(rnorm(N*p),N,p)
Y = cbind(1,X)%*%matrix(c(0.5,bets),ncol = 1)
fn=function(k){du=CCRls.coord(Y,X,k=k,nC=1)
return(list(fval=BIC(du$mobj),obj=du))}
goldopt(fn=fn,interval=c(2,7),tol=1)

Linear regression via coordinate descent with covariate clustering

Description

Covariate assignment to k clusters using the coordinate descent algorithm. This function is a wrapper for the C function linreg_coord_clus

Usage

linrclus(Y, X, k, coefs, clus, clusmns, nC = 1, x = FALSE)
linrclus(Y, X, k, coefs, clus, clusmns, nC = 1, x = FALSE)

Arguments

`Y`	vector of outcome variable
`X`	matrix of covariates. Should not include 1's for the intercept
`k`	number of clusters
`coefs`	vector of coefficients as starting values. Should not include the intercept.
`clus`	vector of covariate cluster assignments as starting values
`clusmns`	vector k cluster parameter centers
`nC`	first nC-1 covariates in X not to cluster. Must be at least 1 for the intercept
`x`	a logical for returning the design matrix

Value

clus cluster assignments

coefs vector of coefficients as starting values

clusmns vector of cluster means

Examples

set.seed(14) #Generate data
N = 1000; (bets = rep(-2:2,4)); p = length(bets); X = matrix(rnorm(N*p),N,p)
Y = cbind(1,X)%*%matrix(c(0.5,bets),ncol = 1)
begin_v<- rep(NA,p)
for (j in 1:p) {
 begin_v[j] = stats::coef(lm(Y~X[,j]))[2]
}
set.seed(12); klus_obj<- kmeans(begin_v,centers = 5)
linrclus(Y,X,k=5,coefs=c(0,begin_v),clus=klus_obj$cluster,clusmns=klus_obj$centers)
set.seed(14) #Generate data
N = 1000; (bets = rep(-2:2,4)); p = length(bets); X = matrix(rnorm(N*p),N,p)
Y = cbind(1,X)%*%matrix(c(0.5,bets),ncol = 1)
begin_v<- rep(NA,p)
for (j in 1:p) {
 begin_v[j] = stats::coef(lm(Y~X[,j]))[2]
}
set.seed(12); klus_obj<- kmeans(begin_v,centers = 5)
linrclus(Y,X,k=5,coefs=c(0,begin_v),clus=klus_obj$cluster,clusmns=klus_obj$centers)

Construct a network design matrix

Description

This function creates the design matrix for a latent network structure using a balanced panel

Usage

netdat(datf, Y, X, Wi, W = NULL, panvar, tvar, factors, scaling = TRUE,
  unicons = TRUE)
netdat(datf, Y, X, Wi, W = NULL, panvar, tvar, factors, scaling = TRUE,
  unicons = TRUE)

Arguments

`datf`	the entire data frame of balanced panel with NT rows of unit-time observations
`Y`	dependent variable in the data frame datf
`X`	the covariate(s) generating spillovers
`Wi`	other unit-varying (can be time-invariant) control variables
`W`	global variables. these are only time varying but are common to all units. eg. GDP for individual/state-level data. Note that W has to be a vector of length T so cannot be in the data frame datf
`panvar`	the panel variable eg. unique person/firm identifiers
`tvar`	time variable, eg. years
`factors`	a vector of characters of factors in the data
`scaling`	a logical indicating whether non-discrete covariates should be scaled by their standard deviations
`unicons`	a logical indicating whether to include unit-specific constant term

Value

Y vector of dependent variables

X a block matrix of spillover matrix ( $TN$ x $N^2$ )

Wm a matrix corresponding to covariate Wi

Wf a matrix of dummies corresponding to factors

Package 'cluscov'

Help Index

Concrete class constructor

Description

Usage

Arguments

Value

Sequential CCR

Description

Usage

Arguments

Value

Examples

Linear regression via coordinate descent with covariate clustering

Description

Usage

Arguments

Value

Examples

Sequential CCR with k clusters

Description

Usage

Arguments

Value

Examples

Model criterion function

Description

Usage

Arguments

Regression - gammainverse class

Description

Usage

Arguments

Value

Examples

Regression - gammalog class

Description

Usage

Arguments

Value

Examples

Regression - lm class

Description

Usage

Arguments

Value

Examples

Regression - logit class

Description

Usage

Arguments

Value

Examples

Regression - negbin class

Description

Usage

Arguments

Value

Regression - poissonidentity class

Description

Usage

Arguments

Value

Examples

Regression - poissonlog class

Description

Usage

Arguments

Value

Examples

Regression - poissonsqrt class

Description

Usage

Arguments

Value

Examples

Regression - probit class

Description

Usage

Arguments