Package 'cgam'

Title: Constrained Generalized Additive Model
Description: A constrained generalized additive model is fitted by the cgam routine. Given a set of predictors, each of which may have a shape or order restrictions, the maximum likelihood estimator for the constrained generalized additive model is found using an iteratively re-weighted cone projection algorithm. The ShapeSelect routine chooses a subset of predictor variables and describes the component relationships with the response. For each predictor, the user needs only specify a set of possible shape or order restrictions. A model selection method chooses the shapes and orderings of the relationships as well as the variables. The cone information criterion (CIC) is used to select the best combination of variables and shapes. A genetic algorithm may be used when the set of possible models is large. In addition, the cgam routine implements a two-dimensional isotonic regression using warped-plane splines without additivity assumptions. It can also fit a convex or concave regression surface with triangle splines without additivity assumptions. See Liao X, Meyer MC (2019)<doi:10.18637/jss.v089.i05> for more details.
Authors: Mary Meyer [aut], Xiyue Liao [aut, cre]
Maintainer: Xiyue Liao <[email protected]>
License: GPL (>= 2)
Version: 1.21
Built: 2024-12-11 07:27:45 UTC
Source: CRAN

Help Index


Extract the Best Fit Returned by the ShapeSelect Routine

Description

The is a subroutine which only works for the ShapeSelect routine. It returns an object of the cgam class given the variables and their shapes chosen by the ShapeSelect routine.

Usage

best.fit(x)

Arguments

x

x is an object of the ShapeSelect class.

Value

object

The best fit returned by the ShapeSelect routine, which is an object of the cgam class.

Author(s)

Xiyue Liao

See Also

cgam, ShapeSelect

Examples

## Not run: 
  library(MASS)
  data(Rubber)

  # do a variable and shape selection with four possible shapes
  # increasing, decreasing, convex and concave 
  ans <- ShapeSelect(loss ~ shapes(hard, set = c("incr", "decr", "conv", "conc"))
  + shapes(tens, set = c("incr", "decr", "conv", "conc")), data = Rubber, genetic = TRUE)

  # check the best fit, which is an object of the cgam class
  bf <- best.fit(ans)
  class(bf)
  plotpersp(bf)

## End(Not run)

Constrained Generalized Additive Model Fitting

Description

The partial linear generalized additive model is fitted using the method of maximum likelihood, where shape or order restrictions can be imposed on the non-parametrically modelled predictors with optional smoothing, and no restrictions are imposed on the optional parametrically modelled covariate.

Usage

cgam(formula, cic = FALSE, nsim = 100, family = gaussian, cpar = 1.5, 
	data = NULL, weights = NULL, sc_x = FALSE, sc_y = FALSE, pnt = TRUE, 
	pen = 0, var.est = NULL, gcv = FALSE, pvf = TRUE)

Arguments

formula

A formula object which gives a symbolic description of the model to be fitted. It has the form "response ~ predictor". The response is a vector of length nn. The specification of the model can be one of the three exponential families: gaussian, binomial and poisson. The systematic component η\eta is E(y)E(y), the log odds of y=1y = 1, and the logarithm of E(y)E(y) respectively. A predictor can be a non-parametrically modelled variable with or without a shape or order restriction, or a parametrically modelled unconstrained covariate. In terms of a non-parametrically modelled predictor, the user is supposed to indicate the relationship between the systematic component η\eta and a predictor xx in the following way:

Assume that η\eta is the systematic component and xx is a predictor:

  • incr(x): η\eta is increasing in xx. See incr for more details.

  • s.incr(x): η\eta is smoothly increasing in xx. See s.incr for more details.

  • decr(x): η\eta is decreasing in xx. See decr for more details.

  • s.decr(x): η\eta is smoothly decreasing in xx. See s.decr for more details.

  • conc(x): η\eta is concave in xx. See conc for more details.

  • s.conc(x): η\eta is smoothly concave in xx. See s.conc for more details.

  • conv(x): η\eta is convex in xx. See conv for more details.

  • s.conv(x): η\eta is smoothly convex in xx. See s.conv for more details.

  • incr.conc(x): η\eta is increasing and concave in xx. See incr.conc for more details.

  • s.incr.conc(x): η\eta is smoothly increasing and concave in xx. See s.incr.conc for more details.

  • decr.conc(x): η\eta is decreasing and concave in xx. See decr.conc for more details.

  • s.decr.conc(x): η\eta is smoothly decreasing and concave in xx. See s.decr.conc for more details.

  • incr.conv(x): η\eta is increasing and convex in xx. See incr.conv for more details.

  • s.incr.conv(x): η\eta is smoothly increasing and convex in xx. See s.incr.conv for more details.

  • decr.conv(x): η\eta is decreasing and convex in xx. See decr.conv for more details.

  • s.decr.conv(x): η\eta is smoothly decreasing and convex in xx. See s.decr.conv for more details.

  • s(x): η\eta is smooth in xx. See s for more details.

  • tree(x): η\eta has a tree-ordering in xx. See tree for more details.

  • umbrella(x): η\eta has an umbrella-ordering in xx. See umbrella for more details.

cic

Logical flag indicating if or not simulations are used to get the cic value. The default is cic = FALSE

nsim

The number of simulations used to get the cic parameter. The default is nsim = 100.

family

A parameter indicating the error distribution and link function to be used in the model. It can be a character string naming a family function or the result of a call to a family function. This is borrowed from the glm routine in the stats package. There are four families used in csvy: Gaussian, binomial, poisson, and Gamma. Note that if family = Ord is specified, a proportional odds regression model with shape constraints is fitted. This is under development.

cpar

A multiplier to estimate the model variance, which is defined as σ2=SSR/(ncparedf)\sigma^2 = SSR / (n - cpar * edf). SSR is the sum of squared residuals for the full model and edf is the effective degrees of freedom. The default is cpar = 1.2. The user-defined value must be between 1 and 2. See Meyer, M. C. and M. Woodroofe (2000) for more details.

data

An optional data frame, list or environment containing the variables in the model. The default is data = NULL.

weights

An optional non-negative vector of "replicate weights" which has the same length as the response vector. If weights are not given, all weights are taken to equal 1. The default is weights = NULL.

sc_x

Logical flag indicating if or not continuous predictors are normalized. The default is sc_x = FALSE.

sc_y

Logical flag indicating if or not the response variable is normalized. The default is sc_y = FALSE.

pen

User-defined penalty parameter. It must be non-negative. It will only be used in a warped-plane spline fit or a triangle spline fit. The default is pen = 0.

pnt

Logical flag indicating if or not penalized constrained regression splines are used. It will only be used in a warped-plane spline fit or a triangle spline fit. The default is pnt = TRUE.

var.est

To do a monotonic variance function estimation, the user can set var.est = s.incr(x) or var.est = s.decr(x). See s.incr and s.decr for more details. The default is var.est = NULL.

gcv

Logical flag indicating if or not gcv is used to choose a penalty term in warped-plane surface fit. The default is gcv = FALSE.

pvf

Logical flag indicating if or not simulations are used to find the p-value of the test of linear vs double monotone in warped plane surface fit.

Details

We consider generalized partial linear models with independent observations from an exponential family of the form p(yi;θ,τ)=exp[{yiθib(θi)}τc(yi,τ)],i=1,,np(y_i;\theta,\tau) = exp[\{y_i\theta_i - b(\theta_i)\}\tau - c(y_i, \tau)], i = 1,\ldots,n, where the specifications of the functions bb and cc determine the sub-family of models. The mean vector μ=E(y)\mu = E(y) has values μi=b(θi)\mu_i = b'(\theta_i), and is related to a design matrix of predictor variables through a monotonically increasing link function g(μi)=ηi,i=1,,ng(\mu_i) = \eta_i, i = 1,\ldots,n, where η\eta is the systematic component and describes the relationship with the predictors. The relationship between η\eta and θ\theta is determined by the link function bb.

For the additive model, the systematic component is specified for each observation by ηi=f1(x1i)++fL(xLi)+ziβ\eta_i = f_1(x_{1i}) + \ldots + f_L(x_{Li}) + z_i'\beta, where the functions flf_l describe the relationships of the non-parametrically modelled predictors xlx_l, β\beta is a parameter vector, and ziz_i contains the values of variables to be modelled parametrically. The non-parametric components are modelled with shape or order assumptions with optional smoothing, and the solution is obtained through an iteratively re-weighted cone projection, with no back-fitting of individual components.

Suppose that η\eta is a nn by 11 vector. The matrix form of the systematic component and the predictor is η=ϕ1++ϕL+Zβ\eta = \phi_1 + \ldots + \phi_L + Z\beta, where ϕl\phi_l is the individual component for the llth non-parametrically modelled predictor, l=1,,Ll = 1, \ldots, L, and ZZ is an nn by pp design matrix for the parametrically modelled covariate.

To model the component ϕl\phi_l, smooth regression splines or non-smooth ordinal basis functions can be used. The constraints for the component ϕl\phi_l are in ClC_l. In the first case, ClC_l = {ϕlRn:ϕl=vl+Blβl\{\phi_l \in R^n: \phi_l = v_l+B_l\beta_l, where βl0\beta_l \ge 0 and vlVl}v_l\in V_l \}, where BlB_l has regression splines as columns and VlV_l is the linear space contained in ClC_l, and in the second case, ClC_l = {ϕRn:Alϕ0\{\phi \in R^n: A_l\phi \ge 0 and Blϕ=0}B_l\phi = 0\}, for inequality constraint matrix AlA_l and equality constraint matrix BlB_l.

The set ClC_l is a convex cone and the set C=C1++Cp+ZC = C_1 + \ldots + C_p + Z is also a convex cone with a finite set of edges, where the edges are the generators of CC, and ZZ is the column space of the design matrix ZZ for the parametrically modelled covariate.

An iteratively re-weighted cone projection algorithm is used to fit the generalized regression model over the cone CC.

See references cited in this section and the official manual (https://cran.r-project.org/package=coneproj) for the R package coneproj for more details.

Value

etahat

The fitted systematic component η\eta.

muhat

The fitted mean value, obtained by transforming the systematic component η\eta by the inverse of the link function.

vcoefs

The estimated coefficients for the basis spanning the null space of the constraint set.

xcoefs

The estimated coefficients for the edges corresponding to the smooth predictors with no shape constraint and shape-restricted predictors.

zcoefs

The estimated coefficients for the parametrically modelled covariate, i.e., the estimation for the vector β\beta.

ucoefs

The estimated coefficients for the edges corresponding to the predictors with an umbrella-ordering constraint.

tcoefs

The estimated coefficients for the edges corresponding to the predictors with a tree-ordering constraint.

coefs

The estimated coefficients for the basis spanning the null space of the constraint set and edges corresponding to the shape-restricted and order-restricted predictors.

cic

The cone information criterion proposed in Meyer(2013a). It uses the "null expected degrees of freedom" as a measure of the complexity of the model. See Meyer(2013a) for further details of cic.

d0

The dimension of the linear space contained in the cone generated by all constraint conditions.

edf0

The estimated "null expected degrees of freedom". It is a measure of the complexity of the model. See Meyer (2013a) and Meyer (2013b) for further details.

edf

The constrained effective degrees of freedom.

etacomps

The fitted systematic component value for non-parametrically modelled predictors. It is a matrix of which each row is the fitted systematic component value for a non-parametrically modelled predictor. If there are more than one such predictors, the order of the rows is the same as the order that the user defines such predictors in the formula argument of cgam.

y

The response variable.

xmat_add

A matrix whose columns represent the shape-restricted predictors and smooth predictors with no shape constraint.

zmat

A matrix whose columns represent the basis for the parametrically modelled covariate. The user can choose to include a constant vector in it or not. It must have full column rank.

ztb

A list keeping track of the order of the parametrically modelled covariate.

tr

A matrix whose columns represent the predictors with a tree-ordering constraint.

umb

A matrix whose columns represent the predictors with an umbrella-ordering constraint.

tree.delta

A matrix whose rows are the edges corresponding to the predictors with a tree-ordering constraint.

umbrella.delta

A matrix whose rows are the edges corresponding to the predictors with an umbrella-ordering constraint.

bigmat

A matrix whose rows are the basis spanning the null space of the constraint set and the edges corresponding to the shape-restricted and order-restricted predictors.

shapes

A vector including the shape and partial-ordering constraints in a cgam fit.

shapesx

A vector including the shape constraints in a cgam fit.

prior.w

User-defined weights.

wt

The weights in the final iteration of the iteratively re-weighted cone projections.

wt.iter

Logical flag indicating if or not iteratively re-weighted cone projections may be used. If the response is gaussian, then wt.iter = FALSE; if the response is binomial or poisson, then wt.iter = TRUE.

family

The family parameter defined in a cgam formula.

SSE0

The sum of squared residuals for the linear part.

SSE1

The sum of squared residuals for the full model.

pvals.beta

The approximate p-values for the estimation of the vector β\beta. A t-distribution is used as the approximate distribution.

se.beta

The standard errors for the estimation of the vector β\beta.

null_df

The degree of freedom for the null model of a cgam fit, i.e., the model only containing a constant vector.

df

The degree of freedom for the null space of a cgam fit.

resid_df_obs

The observed degree of freedom for the residuals of a cgam fit.

null_deviance

The deviance for the null model of a cgam fit, i.e., the model only containing a constant vector.

deviance

The residual deviance of a cgam fit.

tms

The terms objects extracted by the generic function terms from a cgam fit.

capm

The number of edges corresponding to the shape-restricted predictors.

capms

The number of edges corresponding to the smooth predictors with no shape constraint.

capk

The number of non-constant columns of zmat.

capt

The number of edges corresponding to the tree-ordering predictors.

capu

The number of edges corresponding to the umbrella-ordering predictors.

xid1

A vector keeping track of the beginning position of the set of edges in bigmat for each shape-restricted predictor and smooth predictor with no shape constraint in xmat.

xid2

A vector keeping track of the end position of the set of edges in bigmat for each shape-restricted predictor and smooth predictor with no shape constraint in xmat.

tid1

A vector keeping track of the beginning position of the set of edges in bigmat for each tree-ordering factor in tr.

tid2

A vector keeping track of the end position of the set of edges in bigmat for each tree-ordering factor in tr.

uid1

A vector keeping track of the beginning position of the set of edges in bigmat for each umbrella-ordering factor in umb.

uid2

A vector keeping track of the end position of the set of edges in bigmat for each umbrella-ordering factor in umb.

zid

A vector keeping track of the positions of the parametrically modelled covariate.

vals

A vector storing the levels of each variable used as a factor.

zid1

A vector keeping track of the beginning position of the levels of each variable used as a factor.

zid2

A vector keeping track of the end position of the levels of each variable used as a factor.

nsim

The number of simulations used to get the cic parameter.

xnms

A vector storing the names of the shape-restricted predictors and the smooth predictors with no shape constraint in xmat.

ynm

The name of the response variable.

znms

A vector storing the names of the parametrically modelled covariate.

is_param

A logical scalar showing if or not a variable is a parametrically modelled covariate, which could be a linear term or a factor.

is_fac

A logical scalar showing if or not a variable is a factor.

knots

A list storing the knots used for each shape-restricted predictor and smooth predictor with no shape constraint. For a smooth, constrained and a smooth, unconstrainted predictor, knots is a vector of more than 11 elements, and for a shape-restricted predictor without smoothing, knots = 00.

numknots

A vector storing the number of knots for each shape-restricted predictor and smooth predictor with no shape constraint. For a smooth, constrained and a smooth, unconstrainted predictor, numknots > 11, and for a shape-restricted predictor without smoothing, numknots = 00.

sps

A character vector storing the space parameter to create knots for each shape-restricted predictor.

ms

The centering terms used to make edges for shape-restricted predictors.

cpar

The cpar argument in the cgam formula

vh

The estimated monotonic variance function.

kts.var

The knots used in monotonic variance function estimation.

call

The matched call.

Author(s)

Mary C. Meyer and Xiyue Liao

References

Liao, X. and Meyer, M. C. (2019) cgam: An R Package for the Constrained Generalized Additive Model. Journal of Statistical Software 89(5), 1–24.

Meyer, M. C. (2018) A Framework for Estimation and Inference in Generalized Additive Models with Shape and Order Restrictions. Statistical Science 33(4), 595–614.

Meyer, M. C. (2013a) Semi-parametric additive constrained regression. Journal of Nonparametric Statistics 25(3), 715.

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

Meyer, M. C. and M. Woodroofe (2000) On the degrees of freedom in shape-restricted regression. Annals of Statistics 28, 1083–1104.

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

Mammen, E. and K. Yu (2007) Additive isotonic regression. IMS Lecture Notes-Monograph Series Asymptotics: Particles, Process, and Inverse Problems 55, 179–195.

Huang, J. (2002) A note on estimating a partly linear model under monotonicity constraints. Journal of Statistical Planning and Inference 107, 343–351.

Cheng, G.(2009) Semiparametric additive isotonic regression. Journal of Statistical Planning and Inference 139, 1980–1991.

Bacchetti, P. (1989) Additive isotonic models. Journal of the American Statistical Association 84(405), 289–294.

Examples

# Example 1.
  data(cubic)
  # extract x
  x <- cubic$x

  # extract y
  y <- cubic$y

  # regress y on x with no restriction with lm()
  fit.lm <- lm(y ~ x + I(x^2) + I(x^3))

  # regress y on x under the restriction: "increasing and convex"
  fit.cgam <- cgam(y ~ incr.conv(x))

  # make a plot to compare the two fits
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, fit.cgam$muhat, col = 2, lty = 2)
  lines(x, fitted(fit.lm), col = 1, lty = 1)
  legend("topleft", bty = "n", c("constrained cgam fit", "unconstrained lm fit"), 
  lty = c(2, 1), col = c(2, 1))

# Example 2.
## Not run: 
  library(gam)
  data(kyphosis)
  
  # regress Kyphosis on Age, Number, and Start under the restrictions:
  # "concave", "increasing and concave", and "decreasing and concave" 
  fit <- cgam(Kyphosis ~ conc(Age) + incr.conc(Number) + decr.conc(Start), 
  family = binomial(), data = kyphosis) 

## End(Not run)

# Example 3.
  library(MASS)
  data(Rubber)
  
  # regress loss on hard and tens under the restrictions:
  # "decreasing" and "decreasing"
  fit.cgam <- cgam(loss ~ decr(hard) + decr(tens), data = Rubber)
  # "smooth and decreasing" and "smooth and decreasing"
  fit.cgam.s <- cgam(loss ~ s.decr(hard) + s.decr(tens), data = Rubber)
  summary(fit.cgam.s)
  anova(fit.cgam.s)
  
  # make a 3D plot based on fit.cgam and fit.cgam.s
  plotpersp(fit.cgam, th = 120, main = "3D Plot of a Cgam Fit")
  plotpersp(fit.cgam.s, tens, hard, data = Rubber, th = 120, main = "3D Plot of a Smooth Cgam Fit")

# Example 4. monotonic variance estimation
  n <- 400
  x <- runif(n)
  sig <- .1 + exp(15*x-8)/(1+exp(15*x-8))
  e <- rnorm(n)
  mu <- 10*x^2
  y <- mu + sig*e

  fit <- cgam(y ~ s.incr.conv(x), var.est = s.incr(x))
  est.var <- fit$vh
  muhat <- fit$muhat

  par(mfrow = c(1, 2))
  plot(x, y)
  points(sort(x), muhat[order(x)], type = "l", lwd = 2, col = 2)
  lines(sort(x), (mu)[order(x)], col = 4)

  plot(sort(x), est.var[order(x)], col=2, lwd=2, type="l", 
  lty=2, ylab="Variance", ylim=c(0, max(c(est.var, sig^2))))
  points(sort(x), (sig^2)[order(x)], col=1, lwd=2, type="l")

# Example 5. monotonic variance estimation with the lidar data set in SemiPar
  library(SemiPar)
  data(lidar)

  fit <- cgam(logratio ~ s.decr(range), var.est=s.incr(range), data=lidar)
  muhat <- fit$muhat
  est.var <- fit$vh
  
  logratio <- lidar$logratio
  range <- lidar$range
  pfit <- predict(fit, newData=data.frame(range=range), interval="confidence", level=0.95)
  upp <- pfit$upper
  low <- pfit$lower
  
  par(mfrow = c(1, 2))
  plot(range, logratio)
  points(sort(range), muhat[order(range)], type = "l", lwd = 2, col = 2)
  lines(sort(range), upp[order(range)], type = "l", lwd = 2, col = 4)
  lines(sort(range), low[order(range)], type = "l", lwd = 2, col = 4)
  title("Smoothly Decreasing Fit with a Point-Wise Confidence Interval", cex.main=0.5)
  
  plot(range, est.var, col=2, lwd=2, type="l",lty=2, ylab="variance")
  title("Smoothly Increasing Variance", cex.main=0.5)

Constrained Generalized Additive Mixed-Effects Model Fitting

Description

This routine is an addition to the main routine cgam in this package. A random-intercept component is included in a cgam model.

Usage

cgamm(formula, nsim = 0, family = gaussian(), cpar = 1.2, data = NULL, weights = NULL,
sc_x = FALSE, sc_y = FALSE, bisect = TRUE, reml = TRUE, nAGQ = 1L)

Arguments

formula

A formula object which gives a symbolic description of the model to be fitted. It has the form "response ~ predictor + (1|id)", where id is the label for a group effect. For now, only gaussian responses are considered and this routine only includes a random-intercept effect. See cgam for more details.

nsim

The number of simulations used to get the cic parameter. The default is nsim = 0.

family

A parameter indicating the error distribution and link function to be used in the model. For now, the options are family = gaussian() and family = binomial().

cpar

A multiplier to estimate the model variance, which is defined as σ2=SSR/(ncparedf)\sigma^2 = SSR / (n - cpar * edf). SSR is the sum of squared residuals for the full model and edf is the effective degrees of freedom. The default is cpar = 1.2. The user-defined value must be between 1 and 2. See Meyer, M. C. and M. Woodroofe (2000) for more details.

data

An optional data frame, list or environment containing the variables in the model. The default is data = NULL.

weights

An optional non-negative vector of "replicate weights" which has the same length as the response vector. If weights are not given, all weights are taken to equal 1. The default is weights = NULL.

sc_x

Logical flag indicating if or not continuous predictors are normalized. The default is sc_x = FALSE.

sc_y

Logical flag indicating if or not the response variable is normalized. The default is sc_y = FALSE.

bisect

If bisect = TRUE, a 95 percent confidence interval will be found for the variance ratio parameter by a bisection method.

reml

If reml = TRUE, restricted maximum likelihood (REML) method will be used to find estimates instead of maximum likelihood estimation (MLE).

nAGQ

Integer scalar - the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood. Defaults to 1, corresponding to the Laplace approximation. Values greater than 1 produce greater accuracy in the evaluation of the log-likelihood at the expense of speed.

Value

muhat

The fitted fixed-effect term.

ahat

A vector of estimated random-effect terms.

sig2hat

Estimate of the variance (σ2\sigma^2) of between-cluster error terms.

siga2hat

Estimate of the variance (σa2\sigma_a^2) of within-cluster error terms.

thhat

Estimate of the ratio (θ\theta) of two variances.

pv.siga2

pp-value of the test H0:σa2=0H_0: \sigma_a^2=0

ci.siga2

95 percent confidence interval for the variance of within-cluster error terms.

ci.th

95 percent confidence interval for ratio of two variances.

ci.rho

95 percent confidence interval for intraclass correlation coefficient.

ci.sig2

95 percent confidence interval for the variance of between-cluster error terms.

call

The matched call.

Author(s)

Xiyue Liao

Examples

# Example 1 (family = gaussian).

# simulate a balanced data set with 30 clusters
# each cluster has 30 data points
	n <- 30
	m <- 30

# the standard deviation of between cluster error terms is 1
# the standard deviation of within cluster error terms is 2
	sige <- 1
	siga <- 2

# generate a continuous predictor
	x <- 1:(m*n)
	for(i in 1:m) {
		x[(n*(i-1)+1):(n*i)] <- round(runif(n), 3)
	}
# generate a group factor
	group <- trunc(0:((m*n)-1)/n)+1

# generate the fixed-effect term
	mu <- 10*exp(10*x-5)/(1+exp(10*x-5))

# generate the random-intercept term asscosiated with each group
	avals <- rnorm(m, 0, siga)

# generate the response
	y <- 1:(m*n)
	for(i in 1:m){
		y[group == i] <- mu[group == i] + avals[i] + rnorm(n, 0, sige)
	}

# use REML method to fit the model
	ans <- cgamm(y ~ s.incr(x) + (1|group), reml=TRUE)
	summary(ans)
	anova(ans)

	muhat <- ans$muhat
	plot(x, y, col = group, cex = .6)
	lines(sort(x), mu[order(x)], lwd = 2)
	lines(sort(x), muhat[order(x)], col = 2, lty = 2, lwd = 2)
	legend("topleft", bty = "n", c("true fixed-effect term", "cgamm fit"),
	col = c(1, 2), lty = c(1, 2), lwd = c(2, 2))

# Example 2 (family = binomial).
# simulate a balanced data set with 20 clusters
# each cluster has 20 data points

  n <- 20
  m <- 20#
  N <- n*m

  # siga is the sd for the random intercept
  siga <- 1

# generate a group factor
  group <- trunc(0:((m*n)-1)/n)+1
  group <- factor(group)

# generate the random-intercept term asscosiated with each group
  avals <- rnorm(m,0,siga)

# generate the fixed-effect mean term: mu, systematic term: eta and the response: y
  x <- runif(m*n)
  mu <- 1:(m*n)
  y <- 1:(m*n)

  eta <- 2 * (1 + tanh(7 * (x - .8))) - 2
  eta0 <- eta
  for(i in 1:m){eta[group == i] <- eta[group == i] + avals[i]}
  for(i in 1:m){mu[group == i] <- 1 - 1 / (1 + exp(eta[group == i]))}
  for(i in 1:m){y[group == i] <- rbinom(n, size = 1, prob = mu[group == i])}
  dat <- data.frame(x = x, y = y, group = group)
  ansc <- cgamm(y ~ s.incr.conv(x) + (1|group),
  family = binomial(link = "logit"), reml = FALSE, data = dat)
  summary(ansc)
  anova(ansc)

Colorado Forest Data Set

Description

This data set contains 9167 records of different species of live trees for 345 sampled forested plots measured in 2015.

Usage

data("COforest")

Format

A data frame with 9167 observations on the following 19 variables.

PLT_CN

Unique identifier of plot

STATECD

State code using Bureau of Census Federal Information Processing Standards (FIPS)

COUNTYCD

County code (FIPS)

ELEV_PUBLIC

Elevation (ft) extracted spatially using LON_PUBLIC/LAT_PUBLIC

LON_PUBLIC

Fuzzed longitude in decimal degrees using NAD83 datum

LAT_PUBLIC

Fuzzed latitude in decimal degrees using NAD83 datum

ASPECT

a numeric vector

SLOPE

a numeric vector

SUBP

Subplot number

TREE

Tree number within subplot

STATUSCD

Tree status (0:no status; 1:live tree; 2:dead tree; 3:removed)

SPCD

Species code

DIA

Current diameter (in)

HT

Total height (ft): estimated when broken or missing

ACTUALHT

Actual height (ft): measured standing or down

HTCD

Height method code (1:measured; 2:actual measured-length estimated; 3:actual and length estimated; 4:modeled

TREECLCD

Tree class code (2:growing-stock; 3:rough cull; 4:rotten cull)

CR

Compacted crown ratio (percentage)

CCLCD

Crown class (1:open grown; 2:domimant; 3:co-dominant; 4:intermediate; 5:overtopped)

Source

It is provided by Forest Inventory Analysis (FIA) National Program.

References

X. Liao and M. Meyer (2019). Estimation and Inference in Mixed-Effect Regression Models using Shape Constraints, with Application to Tree Height Estimation. (to appear in Journal of the Royal Statistical Society. Series C: Applied Statistics)

Examples

## Not run: 
  library(dplyr)
  library(tidyr)
  data(COforest)

  #re-grouping classes of CCLCD:
  #combine dominant (2) and co-dominant (3)
  #combine intermediate (4) and overtopped (5)
  COforest = COforest 
    mutate(CCLCD = replace(CCLCD, CCLCD == 5, 4))

  #make a list of species, each element is a small data frame for one species
  species = COforest 

  #get the subset for quaking aspen, which is the 4th element in the species list
  sub = species$data[[4]]
  #for quaking aspen, there are only two crown classes: dominant/co-dominant
  #and intermediate/overtopped
  table(sub$CCLCD)
  #   3    4
  #1400  217
  #for quaking aspen, there are only two tree clases: growing-stock and rough cull
  table(sub$TREECLCD)
  #2    3
  #1591   26

  #fit the model
  ansc = cgamm(log(HT)~s.incr.conc(DIA)+factor(CCLCD)+factor(TREECLCD)
  +(1|PLT_CN), reml=TRUE, data=sub)

  #check which classes are significant
  summary(ansc)

  #fixed-effect 95
  newData = data.frame(DIA=sub$DIA,CCLCD=sub$CCLCD,TREECLCD=sub$TREECLCD)
  pfit = predict(ansc, newData,interval='confidence')
  lower = pfit$lower
  upper = pfit$upper
  #we need to use exp(muhat) later in the plot
  muhat = pfit$fit

  #get x and y
  x = sub$DIA
  y = sub$HT

  #get TREECLCD and CCLCD
  z1 = sub$TREECLCD
  z2 = sub$CCLCD

  #plot fixed-effect confidence intervals
  plot(x, y, xlab='DIA (m)', ylab='HT (m)', ylim=c(min(y),max(exp(upper))+10),type='n')
  lines(sort(x[z2==3&z1==2]), (exp(pfit$fit)[z2==3&z1==2])[order(x[z2==3&z1==2])],
  col='slategrey', lty=1, lwd=2)
  lines(sort(x[z2==3&z1==2]), (exp(pfit$lower)[z2==3&z1==2])[order(x[z2==3&z1==2])],
  col='slategrey', lty=1, lwd=2)
  lines(sort(x[z2==3&z1==2]), (exp(pfit$upper)[z2==3&z1==2])[order(x[z2==3&z1==2])],
  col='slategrey', lty=1, lwd=2)

  lines(sort(x[z2==4&z1==2]), (exp(pfit$fit)[z2==4&z1==2])[order(x[z2==4&z1==2])],
  col="blueviolet", lty=2, lwd=2)
  lines(sort(x[z2==4&z1==2]), (exp(pfit$lower)[z2==4&z1==2])[order(x[z2==4&z1==2])],
  col="blueviolet", lty=2, lwd=2)
  lines(sort(x[Cz2==4&z1==2]), (exp(pfit$upper)[Cz2==4&z1==2])[order(x[Cz2==4&z1==2])],
  col="blueviolet", lty=2, lwd=2)

  lines(sort(x[Cz2==3&z1==3]), (exp(pfit$fit)[Cz2==3&z1==3])[order(x[Cz2==3&z1==3])],
  col=3, lty=3, lwd=2)
  lines(sort(x[Cz2==3&z1==3]), (exp(pfit$lower)[Cz2==3&z1==3])[order(x[Cz2==3&z1==3])],
  col=3, lty=3, lwd=2)
  lines(sort(x[Cz2==3&z1==3]), (exp(pfit$upper)[Cz2==3&z1==3])[order(x[Cz2==3&z1==3])],
  col=3, lty=3, lwd=2)

  lines(sort(x[Cz2==4&z1==3]), (exp(pfit$fit)[Cz2==4&z1==3])[order(x[Cz2==4&z1==3])],
  col=2, lty=4, lwd=2)
  lines(sort(x[Cz2==4&z1==3]), (exp(pfit$lower)[Cz2==4&z1==3])[order(x[Cz2==4&z1==3])],
  col=2, lty=4, lwd=2)
  lines(sort(x[Cz2==4&z1==3]), (exp(pfit$upper)[Cz2==4&z1==3])[order(x[Cz2==4&z1==3])],
  col=2, lty=4, lwd=2)
  legend('bottomright', bty='n', cex=.9,c('dominant/co-dominant and growing stock',
  'intermediate/overtopped and growing stock','dominant/co-dominant and rough cull',
  'intermediate/overtopped and rough cull'), col=c('slategrey',"blueviolet",3,2),
  lty=c(1,2,3,4),lwd=c(2,2,2,2), pch=c(24,23,22,21))
  title('Quaking Aspen fits with 95

## End(Not run)

Specify a Concave Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is concave in a predictor in a formula argument to cgam. This is the unsmoothed version.

Usage

conc(x, numknots = 0, knots = 0, space = "E")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

knots

The knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

space

A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E".

Details

"conc" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "conc"; the shape attribute is 4("concave"), and according to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component η\eta and "x" to be concave, will be made. The cone edges are a set of basis employed in the hinge algorithm.

Note that "conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 4("concave"); numknots: the numknots argument in "conc"; knots: the knots argument in "conc"; space: the space argument in "conc".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

See Also

conv

Examples

# generate y
  x <- seq(-1, 2, by = 0.1)
  n <- length(x)
  y <- - x^2 + rnorm(n, .3)  

  # regress y on x under the shape-restriction: "concave"
  ans <- cgam(y ~ conc(x))

  # make a plot
  plot(x, y)
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "concave fit", col = 2, lty = 1)

Specify a Convex Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is convex in a predictor in a formula argument to cgam. This is the unsmoothed version.

Usage

conv(x, numknots = 0, knots = 0, space = "E")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

knots

The knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

space

A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E".

Details

"conv" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "conv"; the shape attribute is 3("convex"), and according to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component η\eta and "x" to be convex, will be made. The cone edges are a set of basis employed in the hinge algorithm.

Note that "conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 3("convex"); numknots: the numknots argument in "conv"; knots: the knots argument in "conv" ; space: the space argument in "conv".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

See Also

conc

Examples

# generate y
  x <- seq(-1, 2, by = 0.1)
  n <- length(x)
  y <- x^2 + rnorm(n, .3)  

  # regress y on x under the shape-restriction: "convex"
  ans <- cgam(y ~ conv(x))

  # make a plot
  plot(x, y)
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "convex fit", col = 2, lty = 1)

A Data Set for Cgam

Description

This data set is used for several examples in the cgam package.

Usage

data(cubic)

Format

A data frame with 50 observations on the following 2 variables.

x

The predictor vector.

y

The response vector.

Source

STAT640 HW 14 given by Dr. Meyer.


Specify a Decreasing Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is decreasing in a predictor in a formula argument to cgam. This is the unsmoothed version.

Usage

decr(x, numknots = 0, knots = 0, space = "E")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

knots

The knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

space

A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E".

Details

"decr" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "decr"; the shape attribute is 2("decreasing"), and according to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component η\eta and "x" to be decreasing, will be made. The cone edges are a set of basis employed in the hinge algorithm.

Note that "decr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 2("decreasing"); numknots: the numknots argument in "decr"; knots: the knots argument in "decr"; space: the space argument in "decr".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

See Also

decr.conc, decr.conv

Examples

data(cubic)

  # extract x
  x <- - cubic$x

  # extract y
  y <- cubic$y

  # regress y on x with the shape restriction: "decreasing"
  ans <- cgam(y ~ decr(x))

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, ans$muhat, col = 2)
  legend("bottomright", bty = "n", "decreasing fit", col = 2, lty = 1)

Specify a Decreasing and Concave Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is decreasing and concave in a predictor in a formula argument to cgam. This is the unsmoothed version.

Usage

decr.conc(x, numknots = 0, knots = 0, space = "E")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

knots

The knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

space

A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E".

Details

"decr.conc" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "decr.conc"; the shape attribute is 8("decreasing and concave"), and according to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component η\eta and "x" to be decreasing and concave, will be made. The cone edges are a set of basis employed in the hinge algorithm.

Note that "decr.conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 8("decreasing and concave"); numknots: the numknots argument in "decr.conc"; knots: the knots argument in "decr.conc"; space: the space argument in "decr.conc".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

See Also

decr.conv, decr

Examples

data(cubic)

  # extract x
  x <-  cubic$x

  # extract y
  y <- - cubic$y

  # regress y on x with the shape restriction: "decreasing" and "concave"
  ans <- cgam(y ~ decr.conc(x))

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "decreasing and concave fit", col = 2, lty = 1)

Specify a Decreasing and Convex Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is decreasing and convex in a predictor in a formula argument to cgam. This is the unsmoothed version.

Usage

decr.conv(x, numknots = 0, knots = 0, space = "E")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

knots

The knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

space

A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E".

Details

"decr.conv" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "decr.conv"; the shape attribute is 6("decreasing and convex"), and according to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component η\eta and "x" to be decreasing and convex, will be made. The cone edges are a set of basis employed in the hinge algorithm.

Note that "decr.conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 6("decreasing and convex"); numknots: the numknots argument in "decr.conv"; knots: the knots argument in "decr.conv"; space: the space argument in "decr.conv".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

See Also

decr.conc, decr

Examples

data(cubic)

  # extract x
  x <- - cubic$x

  # extract y
  y <- cubic$y

  # regress y on x with the shape restriction: "decreasing" and "convex"
  ans <- cgam(y ~ decr.conv(x))

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, ans$muhat, col = 2)
  legend("bottomright", bty = "n", "decreasing and convex fit", col = 2, lty = 1)

To Include a Non-Parametrically Modelled Predictor in a SHAPESELECT Formula

Description

A symbolic routine to indicate that a predictor is included as a non-parametrically modeled predictor in a formula argument to ShapeSelect.

Usage

in.or.out(z)

Arguments

z

A non-parametrically modelled predictor which has the same length as the response vector.

Details

To include a categorical predictor, in.or.out(factor(z)) is used, and to include a linear predictor z, in.or.out(z) is used. If in.or.out is not used, the user can include z in a model by adding z or factor(z) in a ShapeSelect formula.

Value

The vector z with three attributes, i.e., nm: the name of z; shape: 1 or 0 (in or out of the model); type: "fac" or "lin", i.e., z is modelled as a categorical predictor or a linear predictor.

Author(s)

Xiyue Liao

See Also

shapes, ShapeSelect

Examples

## Not run: 
  n <- 100
  # x is a continuous predictor 
  x <- runif(n)
  
  # generate z and to include it as a categorical predictor
  z <- rep(0:1, 50)

  # y is generated as correlated to both x and z
  # the relationship between y and x is smoothly increasing-convex
  y <- x^2 + 2 * I(z == 1) + rnorm(n, sd = 1)

  # call ShapeSelect to find the best model by the genetic algorithm
  # factor(z) may be in or out of the model  
  fit <- ShapeSelect(y ~ shapes(x) + in.or.out(factor(z)), genetic = TRUE)

  # factor(z) isn't chosen and is included in the model
  fit <- ShapeSelect(y ~ shapes(x) + factor(z), genetic = TRUE)

## End(Not run)

Specify an Increasing Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is increasing in a predictor in a formula argument to cgam. This is the unsmoothed version.

Usage

incr(x, numknots = 0, knots = 0, space = "E")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

knots

The knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

space

A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E".

Details

"incr" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "incr"; the shape attribute is 1("increasing"), and according to the value of the vector itself and its attributes, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component η\eta and "x" to be increasing, will be made. The cone edges are a set of basis employed in the hinge algorithm.

Note that "incr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 1("increasing"); numknots: the numknots argument in "incr"; knots: the knots argument in "incr"; space: the space argument in "incr".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

See Also

incr.conc, incr.conv

Examples

data(cubic)

  # extract x
  x <- cubic$x

  # extract y
  y <- cubic$y

  # regress y on x with the shape restriction: "increasing"
  ans <- cgam(y ~ incr(x))

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "increasing fit", col = 2, lty = 1)

Specify an Increasing and Concave Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is increasing and concave in a predictor in a formula argument to cgam. This is the unsmoothed version.

Usage

incr.conc(x, numknots = 0, knots = 0, space = "E")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

knots

The knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

space

A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E".

Details

"incr.conc" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "incr.conc"; the shape attribute is 7("increasing and concave"), and according to the value of the vector itself and its attributes, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component η\eta and "x" to be increasing and concave, will be made. The cone edges are a set of basis employed in the hinge algorithm.

Note that "incr.conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 7("increasing and concave"); numknots: the numknots argument in "incr.conc"; knots: the knots argument in "incr.conc"; space: the space argument in "incr.conc".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

See Also

incr.conv

Examples

data(cubic)

  # extract x
  x <- - cubic$x

  # extract y
  y <- - cubic$y

  # regress y on x with the shape restriction: "increasing" and "concave"
  ans <- cgam(y ~ incr.conc(x))

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "increasing and concave fit", col = 2, lty = 1)

Specify an Increasing and Convex Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is increasing and convex in a predictor in a formula argument to cgam. This is the unsmoothed version.

Usage

incr.conv(x, numknots = 0, knots = 0, space = "E")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

knots

The knots used to smoothly constrain a predictor. The value should be 00 for a shape-restricted predictor without smoothing. The default value is 00.

space

A character specifying the method to create knots. It will not be used for a shape-restricted predictor without smoothing. The default value is "E".

Details

"incr.conv" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "incr.conv"; the shape attribute is 5("increasing and convex"), and according to the value of the vector itself and its attributes, the cone edges of the cone generated by the constraint matrix, which constrains the relationship between the systematic component η\eta and "x" to be increasing and convex, will be made. The cone edges are a set of basis employed in the hinge algorithm.

Note that "incr.conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 5("increasing and convex"); numknots: the numknots argument in "incr.conv"; knots: the knots argument in "incr.conv"; space: the space argument in "incr.conv".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

See Also

incr.conc, incr

Examples

data(cubic)

  # extract x
  x <- cubic$x

  # extract y
  y <- cubic$y

  # regress y on x with the shape restriction: "increasing" and "convex"
  ans <- cgam(y ~ incr.conv(x))

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "increasing and convex fit", col = 2, lty = 1)

Alachua County Study of Mental Impairment

Description

The date set is from a study of mental health for a random sample of 40 adult residents of Alachua County, Florida. Mental impairment is an ordinal response with 4 categories: well, mild symptom formation, moderate symptom formation, and impaired, which are recorded as 1, 2, 3, and 4. Life event index is a composite measure of the number and severity of important life events that occurred with the past three years, e.g., birth of a child, new job, divorce, or death of a family member. It is an integer from 0 to 9. Another covariate is socio-economic status and it is measured as binary: high = 1, low = 0.

Usage

data(mental)

Format

mental

Mental impairment. It is an ordinal response with 44 categories recorded as 11, 22, 33, and 44.

ses

Socio-economic status measured as binary: high = 11, low = 00.

life

Life event index. It is an integer from 00 to 99.

References

Agresti, A. (2010) Analysis of Ordinal Categorical Data, 2nd ed. Hoboken, NJ: Wiley.

See Also

Ord

Examples

# proportional odds model example 
	data(mental)
	
	# model the relationship between the latent variable and life event index as increasing
	# socio-economic status is included as a binary covariate 
	fit.incr <- cgam(mental ~ incr(life) + ses, data = mental, family = Ord)

	# check the estimated probabilities P(mental = k), k = 1, 2, 3, 4
	probs.incr <- fitted(fit.incr)
	head(probs.incr)

Specify an Ordered Categorical Family in a CGAM Formula

Description

This is a subroutine to specify an ordered catergorical family in a cgam formula. It set things up to a routine called cgam.polr. This is learned from the polr routine in the MASS package, which fits a logistic or probit regression model to an ordered categorical response. Currently only the logistic regression model is allowed.

Usage

Ord(link = "identity")

Arguments

link

The link function. Users don't need specify this term.

Details

See the polr section in the official manual of the MASS package (https://cran.r-project.org/package=MASS) for details.

Value

muhat

The estimated expected value of a latent variable.

zeta

Estimated cut-points defining the intervals of a latent variable such that the latent variable is between two adjacent cut-points is equivalent to that the ordered categorical response is in a category.

Author(s)

Xiyue Liao

References

Agresti, A. (2002) Categorical Data. Second edition. Wiley.

See Also

mental

Examples

## Not run: 
	# Example 1. 
	# generate the predictor and the latenet variable
	n <- 500
	set.seed(123)
	x <- runif(n, 0, 1)
	yst <- 5*x^2 + rlogis(n)
	
	# generate observed ordered response, which has levels 1, 2, 3. 
	cts <- quantile(yst, probs = seq(0, 1, length = 4)) 
	yord <- cut(yst, breaks = cts, include.lowest = TRUE, labels = c(1:3), Ord = TRUE)
	y <- as.numeric(levels(yord))[yord] 

	# regress y on x under the shape-restriction: the latent variable is "increasing-convex"
	# w.r.t x
	ans <- cgam(y ~ s.incr.conv(x), family = Ord)
	
	# check the estimated cut-points
	ans$zeta
	
	# check the estimated expected value of the latent variable
	head(ans$muhat)
	
	# check the estimated probabilities P(y = k), k = 1, 2, 3
	head(fitted(ans))
	
	# check the estimated latent variable
	plot(x, yst, cex = 1, type = "n", ylab = "latent variable")
	cols <- topo.colors(3)
	for (i in 1:3) {
		points(x[y == i], yst[y == i], col = cols[i], pch = i, cex = 0.7)
	}
	for (i in 1:2) {
		abline(h = (ans$zeta)[i], lty = 4, lwd = 1)
	}
	lines(sort(x), (5*x^2)[order(x)], lwd = 2)
	lines(sort(x), (ans$muhat)[order(x)], col = 2, lty = 2, lwd = 2)
	legend("topleft", bty = "n", col = c(1, 2), lty = c(1, 2), 
	c("true latent variable", "increasing-convex fit"), lwd = c(1, 1))

## End(Not run)	
## Not run: 
	# Example 2. mental impairment data set 
	# mental impairment is an ordinal response with 4 categories recorded as 1, 2, 3, and 4
	# two covariates are life event index and socio-economic status (high = 1, low = 0)
	data(mental)
	table(mental$mental)
	
	# model the relationship between the latent variable and life event index as increasing
	# socio-economic status is included as a binary covariate 
	fit.incr <- cgam(mental ~ incr(life) + ses, data = mental, family = Ord)

	# check the estimated probabilities P(mental = k), k = 1, 2, 3, 4
	probs.incr <- fitted(fit.incr)
	head(probs.incr)

## End(Not run)

A Data Set for Cgam

Description

This data set is used for the routine plotpersp. It contains 314 observations of blood plasma, beta carotene measurements along with several covariates. High levels of blood plasma and beta carotene are believed to be protective against cancer, and it is of interest to determine the relationships with covariates.

Usage

data(plasma)

Format

logplasma

A numeric vector of the logarithm of plasma levels.

betacaro

A numeric vector of dietary beta carotene consumed mcg per day.

bmi

A numeric vector of BMI values.

cholest

A numeric vector of cholesterol consumed mg per day.

dietfat

A numeric vector of the logarithm of grams of diet fat consumed per day.

fiber

A numeric vector of grams of fiber consumed per day.

retinol

A numeric vector of retinol consumed per day.

smoke

A numeric vector of smoking status (1=Never, 2=Former, 3=Current Smoker).

vituse

A numeric vector of vitamin use (1=Yes, fairly often, 2=Yes, not often, 3=No).

References

Nierenberg, D.,Stukel, T.,Baron, J.,Dain, B.,and Greenberg, E. (1989) Determinants of plasma levels of beta-carotene and retinol. American Journal of Epidemiology 130, 511–521.

Examples

data(plasma)

Create a 3D Plot for a CGAM Object

Description

Given an object of the cgam class, which has at least two non-parametrically modelled predictors, this routine will make a 3D plot of the fit with a set of two non-parametrically modelled predictors in the formula being the x and y labs. If there are more than two non-parametrically modelled predictors, any other such predictor will be evaluated at the largest value which is smaller than or equal to its median value.

If there is any categorical covariate and if the user specifies the argument categ to be a character representing a categorical covariate in the formula, then a 3D plot with multiple parallel surfaces, which represent the levels of a categorical covariate in an ascending order, will be created; otherwise, a 3D plot with only one surface will be created. Each level of a categorical covariate will be evaluated at its mode.

This routine is extended to make a 3D plot for an object fitted by warped-plane splines or triangle splines. Note that two non-parametrically modelled predictors specified in this routine must both be modelled as addtive components, or a pair of predictors forming an isotonic or convex surface without additivity assumption.

This routine is an extension of the generic R graphics routine persp. See the documentation below for more details.

Usage

plotpersp(object,...)

Arguments

object

An object of the cgam class with at least two non-parametrically modelled predictors.

...

Arguments to be passed to the S3 method for the cgam class:

  • x1A non-parametrically modelled predictor in a cgam fit. If the user omits x1 and x2, then the first two non-parametric predictors in a cgam formula will be used.

  • x2A non-parametrically modelled predictor in a cgam fit. If the user omits x1 and x2, then the first two non-parametric predictors in a cgam formula will be used.

  • x1nm: Character name of x1.

  • x2nm: Character name of x2.

  • data: The data frame based on which the user get a cgam fit.

  • surface: The type of the surface of a 3D plot. For a cgam fit, if surface == "mu", then the surface of the estimated mean value of the fit will be plotted; if surface == "eta", then the surface of the estimated systematic component value of the fit will be plotted. The default is surface = "mu"; for a warped-plane spline fit, if surface == "C", then the surface of the constrained estimated mean value of the fit will be plotted, while if surface == "U", then the surface of the unconstrained estimated mean value of the fit will be plotted. The default is surface = "C".

  • categ: Optional categorical covariate(s) in a cgam fit. If there is any categorical covariate and if the user specifies the argument categ to be a character representing a categorical covariate in the formula, then a 3D plot with multiple parallel surfaces, which represent the levels of a categorical covariate in an ascending order, will be created; otherwise, a 3D plot with only one surface will be created. Each level of a categorical covariate will be evaluated at its mode. The default is categ = NULL.

  • col: The color(s) of a 3D plot created by plotpersp. If col == NULL, "white" will be used when there is only one surface in the plot, and a sequence of colors will be used in a fixed order when there are multiple parallel surfaces in the plot. For example, when there are two surfaces, the lower surface will be in the color "peachpuff", and the higher surface will be in the color "lightblue". The default is col = NULL.

  • random: A logical scalar. If random == TRUE, color(s) for a 3D plot will be randomly chosen from ten colors, namely, "peachpuff", "lightblue", "limegreen", "grey", "wheat", "yellowgreen", "seagreen1", "palegreen", "azure", "whitesmoke"; otherwise, "white" will be used when there is only one surface in the plot, and a sequence of colors will be used in a fixed order when there are multiple parallel surfaces in the plot.

  • ngrid: This is a positive integer specifying how dense the xx grid and the yy grid will be. The default is ngrid = 1212. Note that this argument is only used for a cgam fit.

  • xlim: The xlim argument inherited from the persp routine.

  • ylim: The ylim argument inherited from the persp routine.

  • zlim: The zlim argument inherited from the persp routine.

  • xlab: The xlab argument inherited from the persp routine.

  • ylab: The ylab argument inherited from the persp routine.

  • zlab: The zlab argument inherited from the persp routine.

  • main: The main argument inherited from the persp routine.

  • th: The theta argument inherited from the persp routine.

  • ltheta: The ltheta argument inherited from the persp routine.

  • main: The main argument inherited from the persp routine.

  • ticktype: The ticktype argument inherited from the persp routine.

Value

The routine plotpersp returns a 3D plot of an object of the cgam class. The xx lab and yy lab represent a set of non-parametrically modelled predictors used in a cgam formula, and the zz lab represents the estimated mean value or the estimated systematic component value.

Author(s)

Mary C. Meyer and Xiyue Liao

Examples

# Example 1.
  data(FEV)

  # extract the variables
  y <- FEV$FEV
  age <- FEV$age
  height <- FEV$height
  sex <- FEV$sex
  smoke <- FEV$smoke

  fit <- cgam(y ~ incr(age) + incr(height) + factor(sex) + factor(smoke), nsim = 0)
  fit.s <- cgam(y ~ s.incr(age) + s.incr(height) + factor(sex) + factor(smoke), nsim = 0)

  plotpersp(fit, age, height, ngrid = 10, main = "Cgam Increasing Fit", 
  sub = "Categorical Variable: Sex", categ = "factor(sex)")
  plotpersp(fit.s, age, height, ngrid = 10, main = "Cgam Smooth Increasing Fit", 
  sub = "Categorical Variable: Smoke", categ = "factor(smoke)")

# Example 2.
  data(plasma)

  # extract the variables
  y <- plasma$logplasma
  bmi <- plasma$bmi
  logdietfat <- plasma$logdietfat
  cholest <- plasma$cholest
  fiber <- plasma$fiber
  betacaro <- plasma$betacaro
  retinol <- plasma$retinol
  smoke <- plasma$smoke
  vituse <- plasma$vituse

  fit <- cgam(y ~  s.decr(bmi) + s.decr(logdietfat) + s.decr(cholest) + s.incr(fiber) 
+ s.incr(betacaro) + s.incr(retinol) + factor(smoke) + factor(vituse)) 

  plotpersp(fit, bmi, logdietfat, ngrid = 15, th = 120, ylab = "log(dietfat)", 
zlab = "est mean of log(plasma)", main = "Cgam Fit with the Plasma Data Set", 
sub = "Categorical Variable: Vitamin Use", categ = "factor(vituse)")

# Example 3.
  data(plasma)
  addl <- 1:314*0 + 1 
  addl[runif(314) < .3] <- 2
  addl[runif(314) > .8] <- 4
  addl[runif(314) > .8] <- 3

  ans <- cgam(logplasma ~ s.incr(betacaro, 5) + s.decr(bmi) + s.decr(logdietfat) 
+ as.factor(addl), data = plasma)
  plotpersp(ans, betacaro, logdietfat, th = 240, random = TRUE, 
categ = "as.factor(addl)", data = plasma)

# Example 4.
## Not run: 
  n <- 100
  set.seed(123)
  x1 <- sort(runif(n))
  x2 <- sort(runif(n))
  y <- 4 * (x1 - x2) + rnorm(n, sd = .5)

  # regress y on x1 and x2 under the shape-restriction: "decreasing-increasing"
  # with a penalty term = .1
  ans <- cgam(y ~ s.decr.incr(x1, x2), pen = .1)

# plot the constrained surface
  plotpersp(ans)
 
## End(Not run)

Predict Method for CGAM Fits

Description

Predicted values based on a cgam object

Usage

## S3 method for class 'cgam'
predict(object, newData, interval = c("none", "confidence", "prediction"), 
type = c("response", "link"), level = 0.95, n.mix = 500,...)

Arguments

object

A cgam object.

newData

A data frame in which to look for variables with which to predict. If omitted, the fitted values are used.

interval

Type of interval calculation. A prediction interval is only implemented for Gaussian response for now.

type

If the response is Gaussian, type = "response" gives the predicted mean; if the response is binomial, type = "response" gives the predicted probabilities, and type = "link" gives the predicted systematic component.

level

Tolerance/confidence level.

n.mix

Number of simulations to get the mixture distribution. The default is n.mix = 500.

...

Further arguments passed to the routine.

Details

Constrained spline estimators can be characterized by projections onto a polyhedral convex cone. Point-wise confidence intervals for constrained splines are constructed by estimating the probabilities that the projection lands on each of the faces of the cone, and using a mixture of covariance matrices to estimate the standard error of the function estimator at any design point.

Note that currently predict.cgam only works when all components in a cgam formula are additive.

See references cited in this section for more details.

Value

fit

A vector of predictions.

lower

A vector of lower bound if interval is set to be "confidence".

upper

A vector of upper bound if interval is set to be "confidence".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2017) Constrained partial linear regression splines. Statistical Sinica in press.

Meyer, M. C. (2017) Confidence intervals for regression functions using constrained splines with application to estimation of tree height

Meyer, M. C. (2012) Constrained penalized splines. Canadian Journal of Statistics 40(1), 190–206.

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

Examples

# Example 1.
	# generate data
	n <- 100
	set.seed(123)
	x <- runif(n)
	y <- 4*x^3 + rnorm(n)

	# regress y on x under the shape-restriction: "increasing-convex"
	fit <- cgam(y ~ s.incr.conv(x))
	
	# make a data frame 
	x0 <- seq(min(x), max(x), by = 0.05)
	new.Data <- data.frame(x = x0)

	# predict values in new.Data based on the cgam fit without a confidence interval
	pfit <- predict(fit, new.Data)
	
	# or 
	pfit <- predict(fit, new.Data, interval = "none")
	
	# make a plot to check the prediction
	plot(x, y, main = "Predict Method for CGAM")
	lines(sort(x), (fitted(fit)[order(x)]))
	points(x0, pfit$fit, col = 2, pch = 20)

	# predict values in new.Data based on the cgam fit with a 95 percent confidence interval
	pfit <- predict(fit, new.Data, interval = "confidence", level = 0.95)

	# make a plot to check the prediction
	plot(x, y, main = "Pointwise Confidence Bands (Gaussian Response)")
	lines(sort(x), (fitted(fit)[order(x)]))
	lines(sort(x0), (pfit$lower)[order(x0)], col = 2, lty = 2)
	lines(sort(x0), (pfit$upper)[order(x0)], col = 2, lty = 2)
	points(x0, pfit$fit, col = 2, pch = 20)

# Example 2. binomial response
	n <- 200
	x <- seq(0, 1, length = n)

	eta <- 4*x - 2
	mu <- exp(eta)/(1+exp(eta))
	set.seed(123)
	y <- 1:n*0
	y[runif(n)<mu] = 1
        
	fit <- cgam(y ~ s.incr.conv(x), family = binomial)
	muhat <- fitted(fit)
	
    # predict values in new.Data based on the cgam fit with a 95 percent confidence interval
    xinterp <- seq(min(x), max(x), by = 0.05)
    new.Data <- data.frame(x = xinterp)
	pfit <- predict(fit, new.Data, interval = "confidence", level = 0.95)
	pmu <- pfit$fit
	lwr <- pfit$lower
	upp <- pfit$upper
	
	# make a plot to check the prediction	
	plot(x, y, type = "n", ylim = c(0, 1), 
	main = "Pointwise Confidence Bands (Binomial Response)")
	rug(x[y == 0])
    rug(x[y == 1], side = 3)   
	lines(x, mu)
	lines(x, muhat, col = 5, lty = 2)
       
	points(xinterp, pmu, pch = 20)
	lines(xinterp, upp, col = 5)
	points(xinterp, upp, pch = 20)
	lines(xinterp, lwr, col = 5)
	points(xinterp, lwr, pch = 20)

Specify a Smooth Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is smooth in a predictor in a formula argument to cgam. This is the smooth version.

Usage

s(x, numknots = 0, knots = 0, space = "Q")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to constrain xx. It will not be used if the user specifies the knots argument. The default is numknots = 00.

knots

The knots used to constrain xx. User-defined knots will be used when given. Otherwise, numknots and space will be used to create knots. The default is knots = 00.

space

A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal xx quantiles will be created based on xx with duplicate elements removed. The number of knots is numknots when numknots >0> 0. Otherwise it is of the order n1/7n^{1/7}. The default is space = "Q".

Details

"s" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s"; the shape attribute is 17("smooth"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 17("smooth"); numknots: the numknots argument in "s"; knots: the knots argument in "s"; space: the space argument in "s".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

See Also

s.incr, s.decr, s.conc, s.conv, s.incr.conc, s.incr.conv, s.decr.conc, s.decr.conv

Examples

# generate y
  x <- seq(-1, 2, by = 0.1)
  n <- length(x)
  y <- - x^2 + rnorm(n, .3)

  # regress y on x under the shape-restriction: "smooth"
  ans <- cgam(y ~ s(x))
  knots <- ans$knots[[1]]

  # make a plot
  plot(x, y)
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "smooth fit", col = 2, lty = 1)
  legend(1.6, 1.8, bty = "o", "knots", pch = "X")
  points(knots, 1:length(knots)*0+min(y), pch = "X")

Specify a Smooth and Concave Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is smooth and concave in a predictor in a formula argument to cgam. This is the smooth version.

Usage

s.conc(x, numknots = 0, knots = 0, space = "Q")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to constrain xx. It will not be used if the user specifies the knots argument. The default is numknots = 00.

knots

The knots used to constrain xx. User-defined knots will be used when given. Otherwise, numknots and space will be used to create knots. The default is knots = 00.

space

A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal xx quantiles will be created based on xx with duplicate elements removed. The number of knots is numknots when numknots >0> 0. Otherwise it is of the order n1/7n^{1/7}. The default is space = "Q".

Details

"s.conc" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.conc"; the shape attribute is 12("smooth and concave"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 12("smooth and concave"); numknots: the numknots argument in "s.conc"; knots: the knots argument in "s.conc"; space: the space argument in "s.conc".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

See Also

conc

Examples

# generate y
  x <- seq(-1, 2, by = 0.1)
  n <- length(x)
  y <- - x^2 + rnorm(n, .3)

  # regress y on x under the shape-restriction: "smooth and concave"
  ans <- cgam(y ~ s.conc(x))
  knots <- ans$knots[[1]]

  # make a plot
  plot(x, y)
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "smooth and concave fit", col = 2, lty = 1)
  legend(1.6, 1.8, bty = "o", "knots", pch = "X")
  points(knots, 1:length(knots)*0+min(y), pch = "X")

Specify a Doubly-Concave Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that a surface is concave in two predictors in a formula argument to cgam.

Usage

s.conc.conc(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))

Arguments

x1

A numeric predictor which has the same length as the response vector.

x2

A numeric predictor which has the same length as the response vector.

numknots

A vector of the number of knots used to constrain x1x_1 and x2x_2. It will not be used if the user specifies the knots argument and each predictor is within the range of its knots. The default is numknots = c(0, 0).

knots

A list of two vectors of knots used to constrain x1x_1 and x2x_2. User-defined knots will be used if each predictor is within the range of its knots. Otherwise, numknots and space will be used to create knots. The default is knots = list(k1 = 0, k2 = 0).

space

A vector of the character specifying the method to create knots for x1x_1 and x2x_2. It will not be used if the user specifies the knots argument. If "E" is used, then equally spaced knots will be created; if "Q" is used, then a vector of equal quantiles will be created with duplicate elements removed. The number of knots is numknots when numknots is a positive integer >4> 4. Otherwise it is of the order n1/3n^{1/3}. The default is space = c("E", "E").

Details

"s.conc.conc" returns the vectors "x1" and "x2", and imposes on each vector six attributes: name, shape, numknots, knots, space and cvs.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.conc.conc"; the shape attribute is "tri_cvs"(doubly-concave); the cvs values for both vectors are FALSE. According to the value of the vector itself and its shape, numknots, knots, space and cvs attributes, the cone edges will be made by triangle spline basis functions in Meyer (2017). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.conc.conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called trispl.fit

See references cited in this section for more details.

Value

The vectors x1x_1 and x2x_2. Each of them has six attributes, i.e., name: names of x1x_1 and x2x_2; shape: "tri_cvs"(doubly-concave); numknots: the numknots argument in "s.conc.conc"; knots: the knots argument in "s.conc.conc"; space: the space argument in "s.conc.conc"; cvs: two logical values indicating the monotonicity of the isotonically-constrained surface with respect to x1x_1 and x2x_2, which are both FALSE.

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2017) Estimation and inference for regression surfaces using shape-constrained splines.

See Also

s.conv.conv, cgam

Examples

# generate data
	n <- 200
	set.seed(123)
	x1 <- runif(n); x2 <- runif(n)
	y <- -(x1 - 1)^2 - (x2 - 3)^2 + rnorm(n)
   
    # regress y on x1 and x2 under the shape-restriction: "doubly-concave" 
    ans <- cgam(y ~ s.conc.conc(x1, x2), nsim = 0)
    # make a 3D plot of the constrained surface
    plotpersp(ans)

Specify a Smooth and Convex Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is smooth and convex in a predictor in a formula argument to cgam. This is the smooth version.

Usage

s.conv(x, numknots = 0, knots = 0, space = "Q")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to constrain xx. It will not be used if the user specifies the knots argument. The default is numknots = 00.

knots

The knots used to constrain xx. User-defined knots will be used when given. Otherwise, numknots and space will be used to create knots. The default is knots = 00.

space

A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal xx quantiles will be created based on xx with duplicate elements removed. The number of knots is numknots when numknots >0> 0. Otherwise it is of the order n1/7n^{1/7}. The default is space = "Q".

Details

"s.conv" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.conv"; the shape attribute is 11("smooth and convex"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 11("smooth and convex"); numknots: the numknots argument in "s.conv"; knots: the knots argument in "s.conv"; space: the space argument in "s.conv".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

See Also

conv

Examples

# generate y
  x <- seq(-1, 2, by = 0.1)
  n <- length(x)
  y <- x^2 + rnorm(n, .3)

  # regress y on x under the shape-restriction: "smooth and convex"
  ans <- cgam(y ~ s.conv(x))
  knots <- ans$knots[[1]]

  # make a plot
  plot(x, y)
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "smooth and convex fit", col = 2, lty = 1)
  legend(1.6, -1, bty = "o", "knots", pch = "X")
  points(knots, 1:length(knots)*0+min(y), pch = "X")

Specify a Doubly-convex Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that a surface is convex in two predictors in a formula argument to cgam.

Usage

s.conv.conv(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))

Arguments

x1

A numeric predictor which has the same length as the response vector.

x2

A numeric predictor which has the same length as the response vector.

numknots

A vector of the number of knots used to constrain x1x_1 and x2x_2. It will not be used if the user specifies the knots argument and each predictor is within the range of its knots. The default is numknots = c(0, 0).

knots

A list of two vectors of knots used to constrain x1x_1 and x2x_2. User-defined knots will be used if each predictor is within the range of its knots. Otherwise, numknots and space will be used to create knots. The default is knots = list(k1 = 0, k2 = 0).

space

A vector of the character specifying the method to create knots for x1x_1 and x2x_2. It will not be used if the user specifies the knots argument. If "E" is used, then equally spaced knots will be created; if "Q" is used, then a vector of equal quantiles will be created with duplicate elements removed. The number of knots is numknots when numknots is a positive integer >4> 4. Otherwise it is of the order n1/3n^{1/3}. The default is space = c("E", "E").

Details

"s.conv.conv" returns the vectors "x1" and "x2", and imposes on each vector six attributes: name, shape, numknots, knots, space and cvs.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.conv.conv"; the shape attribute is "tri_cvs"(doubly-convex); the cvs values for both vectors are TRUE. According to the value of the vector itself and its shape, numknots, knots, space and cvs attributes, the cone edges will be made by triangle spline basis functions in Meyer (2017). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.conv.conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called trispl.fit

See references cited in this section for more details.

Value

The vectors x1x_1 and x2x_2. Each of them has six attributes, i.e., name: names of x1x_1 and x2x_2; shape: "tri_cvs"(doubly-convex); numknots: the numknots argument in "s.conv.conv"; knots: the knots argument in "s.conv.conv"; space: the space argument in "s.conv.conv"; cvs: two logical values indicating the monotonicity of the isotonically-constrained surface with respect to x1x_1 and x2x_2, which are both TRUE.

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2017) Estimation and inference for regression surfaces using shape-constrained splines.

See Also

s.conv.conv, cgam

Examples

# generate data
	n <- 200
	set.seed(123)
	x1 <- runif(n); x2 <- runif(n)
	y <- (x1 - 1)^2 + (x2 - 3)^2 + rnorm(n)
    
    # regress y on x1 and x2 under the shape-restriction: "doubly-convex" 
    ans <- cgam(y ~ s.conv.conv(x1, x2), nsim = 0)
    # make a 3D plot of the constrained surface
    plotpersp(ans)

Specify a Smooth and Decreasing Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is smooth and decreasing in a predictor in a formula argument to cgam. This is the smooth version.

Usage

s.decr(x, numknots = 0, knots = 0, var.knots = 0, space = "Q", db.exp = FALSE)

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to constrain xx. It will not be used if the user specifies the knots argument. The default is numknots = 00.

knots

The knots used to constrain xx. User-defined knots will be used when given. Otherwise, numknots and space will be used to create knots. The default is knots = 00.

var.knots

The knots used in variance function estimation. User-defined knots will be used when given. The default is var.knots = 00.

space

A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal xx quantiles will be created based on xx with duplicate elements removed. The number of knots is numknots when numknots >0> 0. Otherwise it is of the order n1/7n^{1/7}. The default is space = "Q".

db.exp

The parameter will be used in variance function estimation. If db.exp = TRUE, then the errors are assumed to follow a normal distribution; otherwise, the errors are assumed to follow a double-exponential distribution. The default is db.exp = FALSE.

Details

"s.decr" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots, space, var.knots and db.exp.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.decr"; the shape attribute is 10("smooth and decreasing"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by I-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.decr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

var.knots and db.exp will be used for monotonic variance function estimation.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e, name: the name of x; shape: 10("smooth and decreasing"); numknots: the numknots argument in "s.decr"; knots: the knots argument in "s.decr"; space: the space argument in "s.decr".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

See Also

decr

Examples

data(cubic)

  # extract x
  x <- - cubic$x

  # extract y
  y <- cubic$y

  # regress y on x under the shape-restriction: "smooth and decreasing"
  ans <- cgam(y ~ s.decr(x))
  knots <- ans$knots[[1]]

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "smooth and decreasing fit", col = 2, lty = 1)
  legend(-.3, 8, bty = "o", "knots", pch = "X")
  points(knots, 1:length(knots)*0+min(y), pch = "X")

Specify a Smooth, Decreasing and Concave Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is smooth, decreasing and concave in a predictor in a formula argument to cgam. This is the smooth version.

Usage

s.decr.conc(x, numknots = 0, knots = 0, space = "Q")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to constrain xx. It will not be used if the user specifies the knots argument. The default is numknots = 00.

knots

The knots used to constrain xx. User defined knots will be used when given. Otherwise, numknots and space will be used to create knots. The default is knots = 00.

space

A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal xx quantiles will be created based on xx with duplicate elements removed. The number of knots is numknots when numknots >0> 0. Otherwise it is of the order n1/7n^{1/7}. The default is space = "Q".

Details

"s.decr.conc" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.decr.conc"; the shape attribute is 16("smooth, decreasing and concave"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.decr.conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 16("smooth, decreasing and concave"); numknots: the numknots argument in "s.decr.conc"; knots: the knots argument in "s.decr.conc"; space: the space argument in "s.decr.conc".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

See Also

decr.conv, decr

Examples

data(cubic)

  # extract x
  x <-  cubic$x

  # extract y
  y <- - cubic$y

  # regress y on x under the shape-restriction: "smooth, decreasing and concave"
  ans <- cgam(y ~ s.decr.conc(x))
  knots <- ans$knots[[1]]

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "smooth, decreasing and concave fit", col = 2, lty = 1)
  legend(1.7, 4, bty = "o", "knots", pch = "X")
  points(knots, 1:length(knots)*0+min(y), pch = "X")

Specify a Smooth, Decreasing and Convex Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is smooth, decreasing and convex in a predictor in a formula argument to cgam. This is the smooth version.

Usage

s.decr.conv(x, numknots = 0, knots = 0, space = "Q")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to constrain xx. It will not be used if the user specifies the knots argument. The default is numknots = 00.

knots

The knots used to constrain xx. User-defined knots will be used when given. Otherwise, numknots and space will be used to create knots. The default is knots = 00.

space

A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal xx quantiles will be created based on xx with duplicate elements removed. The number of knots is numknots when numknots >0> 0. Otherwise it is of the order n1/7n^{1/7}. The default is space = "Q".

Details

"s.decr.conv" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.decr.conv"; the shape attribute is 15("smooth, decreasing and convex"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.decr.conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 15("smooth, decreasing and convex"); numknots: the numknots argument in "s.decr.conv"; knots: the knots argument in "s.decr.conv"; space: the space argument in "s.decr.conv".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

See Also

decr.conv

Examples

data(cubic)

  # extract x
  x <- - cubic$x

  # extract y
  y <- cubic$y

  # regress y on x under the shape-restriction: "smooth, decreasing and convex"
  ans <- cgam(y ~ s.decr.conv(x))
  knots <- ans$knots[[1]]

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "smooth, decreasing and convex fit", col = 2, lty = 1)
  legend(-.3, 9.2, bty = "o", "knots", pch = "X")
  points(knots, 1:length(knots)*0+min(y), pch = "X")

Specify a Doubly-Decreasing Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that a surface is decreasing in two predictors in a formula argument to cgam.

Usage

s.decr.decr(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))

Arguments

x1

A numeric predictor which has the same length as the response vector.

x2

A numeric predictor which has the same length as the response vector.

numknots

A vector of the number of knots used to constrain x1x_1 and x2x_2. It will not be used if the user specifies the knots argument and each predictor is within the range of its knots. The default is numknots = c(0, 0).

knots

A list of two vectors of knots used to constrain x1x_1 and x2x_2. User-defined knots will be used if each predictor is within the range of its knots. Otherwise, numknots and space will be used to create knots. The default is knots = list(k1 = 0, k2 = 0).

space

A vector of the character specifying the method to create knots for x1x_1 and x2x_2. It will not be used if the user specifies the knots argument. If "E" is used, then equally spaced knots will be created; if "Q" is used, then a vector of equal quantiles will be created with duplicate elements removed. The number of knots is numknots when numknots is a positive integer >4> 4. Otherwise it is of the order n1/6n^{1/6}. The default is space = c("E", "E").

Details

"s.decr.decr" returns the vectors "x1" and "x2", and imposes on each vector six attributes: name, shape, numknots, knots, space and decreasing.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.decr.decr"; the shape attribute is "wps_dd"(doubly-decreasing); the decreasing values for both vectors are TRUE. According to the value of the vector itself and its shape, numknots, knots, space and decreasing attributes, the cone edges will be made by warped-plane spline basis functions in Meyer (2016). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.decr.decr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta_wps.

See references cited in this section for more details.

Value

The vectors x1x_1 and x2x_2. Each of them has six attributes, i.e., name: names of x1x_1 and x2x_2; shape: "wps_dd"(doubly-decreasing); numknots: the numknots argument in "s.decr.decr"; knots: the knots argument in "s.decr.decr"; space: the space argument in "s.decr.decr"; decreasing: two logical values indicating the monotonicity of the isotonically-constrained surface with respect to x1x_1 and x2x_2, which are both TRUE.

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2017) Estimation and inference for regression surfaces using shape-constrained splines.

See Also

s.incr.incr, s.decr.incr, s.incr.decr, cgam

Examples

## Not run: 
  # generate data
  n <- 100
  set.seed(123)
  x1 <- runif(n)
  x2 <- runif(n)
  y <- -4 * (x1 + x2 - x1 * x2) + rnorm(n, sd = .2)

  # regress y on x1 and x2 under the shape-restriction: "doubly-decreasing" 
  # using the penalized estimator
  ans <- cgam(y ~ s.decr.decr(x1, x2), pnt = TRUE)

  # make a 3D plot of the constrained surface
  plotpersp(ans)	

## End(Not run)

Specify a Decreasing-Increasing Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that a surface is decreasing in one predictor and increasing in another in a formula argument to cgam.

Usage

s.decr.incr(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))

Arguments

x1

A numeric predictor which has the same length as the response vector.

x2

A numeric predictor which has the same length as the response vector.

numknots

A vector of the number of knots used to constrain x1x_1 and x2x_2. It will not be used if the user specifies the knots argument and each predictor is within the range of its knots. The default is numknots = c(0, 0).

knots

A list of two vectors of knots used to constrain x1x_1 and x2x_2. User-defined knots will be used if each predictor is within the range of its knots. Otherwise, numknots and space will be used to create knots. The default is knots = list(k1 = 0, k2 = 0).

space

A vector of the character specifying the method to create knots for x1x_1 and x2x_2. It will not be used if the user specifies the knots argument. If "E" is used, then equally spaced knots will be created; if "Q" is used, then a vector of equal quantiles will be created with duplicate elements removed. The number of knots is numknots when numknots is a positive integer >4> 4. Otherwise it is of the order n1/6n^{1/6}. The default is space = c("E", "E").

Details

"s.decr.incr" returns the vectors "x1" and "x2", and imposes on each vector six attributes: name, shape, numknots, knots, space and decreasing.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.decr.incr"; the shape attribute is "wps_di"(decreasing-increasing); the decreasing values for "x1" and "x2" are TRUE and FALSE. According to the value of the vector itself and its shape, numknots, knots, space and decreasing attributes, the cone edges will be made by warped-plane spline basis functions in Meyer (2016). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.decr.incr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta_wps.

See references cited in this section for more details.

Value

The vectors x1x_1 and x2x_2. Each of them has six attributes, i.e., name: names of x1x_1 and x2x_2; shape: "wps_di"(decreasing-increasing); numknots: the numknots argument in "s.decr.incr"; knots: the knots argument in "s.decr.incr"; space: the space argument in "s.decr.incr"; decreasing: two logical values indicating the monotonicity of the isotonically-constrained surface with respect to x1x_1 and x2x_2, which are TRUE and FALSE.

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2017) Estimation and inference for regression surfaces using shape-constrained splines.

See Also

s.incr.incr, s.incr.decr, s.decr.decr, cgam

Examples

## Not run: 
  # generate data
  n <- 100
  set.seed(123)
  x1 <- runif(n)
  x2 <- runif(n)
  y <- 4 * (x2 - x1) - x1 * x2 + rnorm(n, sd = .2)

  # regress y on x1 and x2 under the shape-restriction: "decreasing-increasing"
  # using the penalized estimator
  ans <- cgam(y ~ s.decr.incr(x1, x2), pnt = TRUE)

  # make a 3D plot of the constrained surface
  plotpersp(ans)

## End(Not run)

Specify a Smooth and Increasing Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is smooth and increasing in a predictor in a formula argument to cgam. This is the smooth version.

Usage

s.incr(x, numknots = 0, knots = 0, var.knots = 0, space = "Q", db.exp = FALSE)

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to constrain xx. It will not be used if the user specifies the knots argument. The default is numknots = 00.

knots

The knots used to constrain xx. User-defined knots will be used when given. Otherwise, numknots and space will be used to create knots. The default is knots = 00.

var.knots

The knots used in variance function estimation. User-defined knots will be used when given. The default is var.knots = 00.

space

A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal xx quantiles will be created based on xx with duplicate elements removed. The number of knots is numknots when numknots >0> 0. Otherwise it is of the order n1/7n^{1/7}. The default is space = "Q".

db.exp

The parameter will be used in variance function estimation. If db.exp = TRUE, then the errors are assumed to follow a normal distribution; otherwise, the errors are assumed to follow a double-exponential distribution. The default is db.exp = FALSE.

Details

"s.incr" returns the vector "x" and imposes on it seven attributes: name, shape, numknots, knots, space, var.knots and db.exp.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.incr"; the shape attribute is 9("smooth and increasing"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by I-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.incr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

var.knots and db.exp will be used for monotonic variance function estimation.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 9("smooth and increasing"); numknots: the numknots argument in "s.incr"; knots: the knots argument in "s.incr"; space: the space argument in "s.incr".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

See Also

incr

Examples

data(cubic)

  # extract x
  x <- cubic$x

  # extract y
  y <- cubic$y

  # regress y on x with the shape restriction: "smooth and increasing"
  ans <- cgam(y ~ s.incr(x))
  knots <- ans$knots[[1]]

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "smooth and increasing fit", col = 2, lty = 1)
  legend(1.7, 9.2, bty = "o", "knots", pch = "X")
  points(knots, 1:length(knots)*0+min(y), pch = "X")

Specify a Smooth, Increasing and Concave Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is smooth, increasing and concave in a predictor in a formula argument to cgam. This is the smooth version.

Usage

s.incr.conc(x, numknots = 0, knots = 0, space = "Q")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to constrain xx. It will not be used if the user specifies the knots argument. The default is numknots = 00.

knots

The knots used to constrain xx. User-defined knots will be used when given. Otherwise, numknots and space will be used to create knots. The default is knots = 00.

space

A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal xx quantiles will be created based on xx with duplicate elements removed. The number of knots is numknots when numknots >0> 0. Otherwise it is of the order n1/7n^{1/7}. The default is space = "Q".

Details

"s.incr.conc" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.incr.conc"; the shape attribute is 14("smooth, increasing and concave"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.incr.conc" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 14("smooth, increasing and concave"); numknots: the numknots argument in "s.incr.conc"; knots: the knots argument in "s.incr.conc"; space: the space argument in "s.incr.conc".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

See Also

incr.conv

Examples

data(cubic)

  # extract x
  x <- - cubic$x

  # extract y
  y <- - cubic$y

  # regress y on x with the shape restriction: "smooth, increasing and concave"
  ans <- cgam(y ~ s.incr.conc(x))
  knots <- ans$knots[[1]]

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "smooth, increasing and concave fit", col = 2, lty = 1)
  legend(-.3, 4, bty = "o", "knots", pch = "X")
  points(knots, 1:length(knots)*0+min(y), pch = "X")

Specify an Smooth, Increasing and Convex Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta is smooth, increasing and convex in a predictor in a formula argument to cgam. This is the smooth version.

Usage

s.incr.conv(x, numknots = 0, knots = 0, space = "Q")

Arguments

x

A numeric predictor which has the same length as the response vector.

numknots

The number of knots used to constrain xx. It will not be used if the user specifies the knots argument. The default is numknots = 00.

knots

The knots used to constrain xx. User-defined knots will be used when given. Otherwise, numknots and space will be used to create knots. The default is knots = 00.

space

A character specifying the method to create knots. It will not be used if the user specifies the knots argument. If space == "E", then equally spaced knots will be created; if space == "Q", then a vector of equal xx quantiles will be created based on xx with duplicate elements removed. The number of knots is numknots when numknots >0> 0. Otherwise it is of the order n1/7n^{1/7}. The default is space = "Q".

Details

"s.incr.conv" returns the vector "x" and imposes on it five attributes: name, shape, numknots, knots and space.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.incr.conv"; the shape attribute is 13("smooth, increasing and convex"). According to the value of the vector itself and its shape, numknots, knots and space attributes, the cone edges will be made by C-spline basis functions in Meyer (2008). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.incr.conv" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta in cgam.

See references cited in this section for more details.

Value

The vector x with five attributes, i.e., name: the name of x; shape: 13("smooth, increasing and convex"); numknots: the numknots argument in "s.incr.conv"; knots: the knots argument in "s.incr.conv"; space: the space argument in "s.incr.conv".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

See Also

incr.conv

Examples

data(cubic)

  # extract x
  x <- cubic$x

  # extract y
  y <- cubic$y

  # regress y on x with the shape restriction: "smooth, increasing and convex"
  ans <- cgam(y ~ s.incr.conv(x))
  knots <- ans$knots[[1]]

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, xlab = "x", ylab = "y")
  lines(x, ans$muhat, col = 2)
  legend("topleft", bty = "n", "smooth, increasing and convex fit", col = 2, lty = 1)
  legend(1.7, 9.2, bty = "o", "knots", pch = "X")
  points(knots, 1:length(knots)*0+min(y), pch = "X")

Specify an Increasing-Decreasing Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that a surface is decreasing in one predictor and increasing in another in a formula argument to cgam.

Usage

s.incr.decr(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))

Arguments

x1

A numeric predictor which has the same length as the response vector.

x2

A numeric predictor which has the same length as the response vector.

numknots

A vector of the number of knots used to constrain x1x_1 and x2x_2. It will not be used if the user specifies the knots argument and each predictor is within the range of its knots. The default is numknots = c(0, 0).

knots

A list of two vectors of knots used to constrain x1x_1 and x2x_2. User-defined knots will be used if each predictor is within the range of its knots. Otherwise, numknots and space will be used to create knots. The default is knots = list(k1 = 0, k2 = 0).

space

A vector of the character specifying the method to create knots for x1x_1 and x2x_2. It will not be used if the user specifies the knots argument. If "E" is used, then equally spaced knots will be created; if "Q" is used, then a vector of equal quantiles will be created with duplicate elements removed. The number of knots is numknots when numknots is a positive integer >4> 4. Otherwise it is of the order n1/6n^{1/6}. The default is space = c("E", "E").

Details

"s.incr.decr" returns the vectors "x1" and "x2", and imposes on each vector six attributes: name, shape, numknots, knots, space and decreasing.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.incr.decr"; the shape attribute is "wps_id"(increasing-decreasing); the decreasing values for "x1" and "x2" are TRUE and FALSE. According to the value of the vector itself and its shape, numknots, knots, space and decreasing attributes, the cone edges will be made by warped-plane spline basis functions in Meyer (2016). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.incr.decr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta_wps.

See references cited in this section for more details.

Value

The vectors x1x_1 and x2x_2. Each of them has six attributes, i.e., name: names of x1x_1 and x2x_2; shape: "wps_id"(increasing-decreasing); numknots: the numknots argument in "s.incr.decr"; knots: the knots argument in "s.incr.decr"; space: the space argument in "s.incr.decr"; decreasing: two logical values indicating the monotonicity of the isotonically-constrained surface with respect to x1x_1 and x2x_2, which are FALSE and TRUE.

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2017) Estimation and inference for regression surfaces using shape-constrained splines.

See Also

s.incr.incr, s.decr.decr, s.decr.incr, cgam

Examples

## Not run: 
  # generate data
  n <- 100
  set.seed(123)
  x1 <- runif(n)
  x2 <- runif(n)
  y <- 4 * (x1 - x2) - x1 * x2 + rnorm(n, sd = .2)

  # regress y on x1 and x2 under the shape-restriction: "increasing-decreasing"
  # using the penalized estimator
  ans <- cgam(y ~ s.incr.decr(x1, x2), pnt = TRUE)

  # make a 3D plot of the constrained surface
  plotpersp(ans)
 
## End(Not run)

Specify a Doubly-Increasing Shape-Restriction in a CGAM Formula

Description

A symbolic routine to define that a surface is increasing in two predictors in a formula argument to cgam.

Usage

s.incr.incr(x1, x2, numknots = c(0, 0), knots = list(k1 = 0, k2 = 0), space = c("E", "E"))

Arguments

x1

A numeric predictor which has the same length as the response vector.

x2

A numeric predictor which has the same length as the response vector.

numknots

A vector of the number of knots used to constrain x1x_1 and x2x_2. It will not be used if the user specifies the knots argument and each predictor is within the range of its knots. The default is numknots = c(0, 0).

knots

A list of two vectors of knots used to constrain x1x_1 and x2x_2. User-defined knots will be used if each predictor is within the range of its knots. Otherwise, numknots and space will be used to create knots. The default is knots = list(k1 = 0, k2 = 0).

space

A vector of the character specifying the method to create knots for x1x_1 and x2x_2. It will not be used if the user specifies the knots argument. If "E" is used, then equally spaced knots will be created; if "Q" is used, then a vector of equal quantiles will be created with duplicate elements removed. The number of knots is numknots when numknots is a positive integer >4> 4. Otherwise it is of the order n1/6n^{1/6}. The default is space = c("E", "E").

Details

"s.incr.incr" returns the vectors "x1" and "x2", and imposes on each vector six attributes: name, shape, numknots, knots, space and decreasing.

The name attribute is used in the subroutine plotpersp; the numknots, knots and space attributes are the same as the numknots, knots and space arguments in "s.incr.incr"; the shape attribute is "wps_ii"(doubly-increasing); the decreasing values for both vectors are FALSE. According to the value of the vector itself and its shape, numknots, knots, space and decreasing attributes, the cone edges will be made by warped-plane spline basis functions in Meyer (2016). The cone edges are a set of basis employed in the hinge algorithm.

Note that "s.incr.incr" does not make the corresponding cone edges itself. It sets things up to a subroutine called makedelta_wps.

See references cited in this section for more details.

Value

The vectors x1x_1 and x2x_2. Each of them has six attributes, i.e., name: names of x1x_1 and x2x_2; shape: "wps_ii"(doubly-increasing); numknots: the numknots argument in "s.incr.incr"; knots: the knots argument in "s.incr.incr"; space: the space argument in "s.incr.incr"; decreasing: two logical values indicating the monotonicity of the isotonically-constrained surface with respect to x1x_1 and x2x_2, which are both FALSE.

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2017) Estimation and inference for regression surfaces using shape-constrained splines.

See Also

s.decr.decr, s.decr.incr, cgam

Examples

## Not run: 
  # generate data
  n <- 100
  set.seed(123)
  x1 <- runif(n)
  x2 <- runif(n)
  y <- 4 * (x1 + x2 - x1 * x2) + rnorm(n, sd = .2)

  # regress y on x1 and x2 under the shape-restriction: "doubly-increasing" 
  # using the penalized estimator
  ans <- cgam(y ~ s.incr.incr(x1, x2), pnt = TRUE)

  # make a 3D plot of the constrained surface
  plotpersp(ans)

## End(Not run)

To Include a Non-Parametrically Modelled Predictor in a SHAPESELECT Formula

Description

A symbolic routine to indicate that a predictor is included as a non-parametrically modelled predictor in a formula argument to ShapeSelect.

Usage

shapes(x, set = "s.9")

Arguments

x

A numeric predictor which has the same length as the response vector.

set

A character or a numeric vector indicating all possible shapes defined for x. For example, we are not only interested in modeling the relationship between the growth of an organism (dependent variable yy) and time (independent variable xx), but we are also interested in the shape of the growth curve. Suppose we know a priori that the shape could be flat, increasing, increasing concave, or increasing convex, and we further know that the curve is smooth, we can write y ~ shapes(x, set = c("flat", "s.incr", "s.incr.conc", "s.incr.conv")) in a formula to impose the four possible shape constraints on the growth curve and model it with splines.

To be more specific, the user can choose to specify this argument as following

  • 1. It could be written as "s.5", "s.9", "ord.5", "ord.9", and "tree", where "s.5" ("ord.5") means that the relationship between the response and a predictor xx is modelled with regression splines (ordinal regression basis functions) with five possible shapes, i.e., flat, increasing, decreasing, convex, and concave; "s.9" ("ord.9") includes four more possible shapes, which are the combination of monotonicity and convexity; "tree" specifies that xx is included as an ordinal predictor with three possibilities: no effect, tree-ordering, and unordered effect.

  • 2. Or the user can choose any subset of the possible shapes, i.e., flat, increasing, decreasing, convex, concave, and combination of monotonicity and convexity. The symbols are "flat", "incr", "decr", "conv", "conc", "incr.conv", "decr.conv", "incr.conc", and "decr.conc". To specify a spline-based regression, the user needs write something like "s.incr", "s.decr", etc.

  • 3. It can also be a subset of integers between 0 and 16, where 0 is the flat shape, 1 ~ 8 indicate increasing, decreasing, convex, concave, increasing-convex, decreasing-convex, increasing-concave, and decreasing-concave, while 9 ~ 16 indicate the same shapes with a smooth assumption.

The default is set = "s.9".

Value

The vector x with three attributes, i.e., nm: the name of x; shape: a numeric vector ranging from 0 to 16 to indicate possible shapes imposed on the relationship between the response and x; type: "nparam", i.e., x is non-parametrically modelled.

Author(s)

Xiyue Liao

See Also

in.or.out, ShapeSelect

Examples

## Not run: 
# Example 1.
  n <- 100 
   
  # generate predictors, x is non-parametrically modelled 
  # and z is parametrically modelled
  x <- runif(n)
  z <- rep(0:1, 50)
  
  # E(y) is generated as correlated to both x and z
  # the relationship between E(y) and x is smoothly increasing-convex
  y <- x^2 + 2 * I(z == 1) + rnorm(n, sd = 1)

  # call ShapeSelect to find the best model by the genetic algorithm
  fit <- ShapeSelect(y ~ shapes(x) + in.or.out(factor(z)), genetic = TRUE)

# Example 2.
  n <- 100
  z <- rep(c("A","B"), n / 2)
  x <- runif(n)

  # y0 is generated as correlated to z with a tree-ordering in it
  # y0 is smoothly increasing-convex in x
  y0 <- x^2 + I(z == "B") * 1.5
  y <- y0 + rnorm(n, 1)

  fit <- ShapeSelect(y ~ s.incr(x) + shapes(z, set = "tree"), genetic = FALSE)
  
  # check the best fit in terms of z
  fit$top

## End(Not run)

Variable and Shape Selection via Genetic Algorithm

Description

The partial linear generalized additive model is considered, where the goal is to choose a subset of predictor variables and describe the component relationships with the response, in the case where there is very little a priori information. For each predictor, the user need only specify a set of possible shape or order restrictions. A model selection method chooses the shapes and orderings of the relationships as well as the variables. For each possible combination of shapes and orders for the predictors, the maximum likelihood estimator for the constrained generalized additive model is found using iteratively re-weighted cone projections. The cone information criterion is used to select the best combination of variables and shapes.

Usage

ShapeSelect(formula, family = gaussian, cpar = 2, data = NULL, weights = NULL, 
npop = 200, per.mutate = 0.05, genetic = FALSE)

Arguments

formula

A formula object which includes a set of predictors to be selected. It has the form "response ~ predictor". The response is a vector of length nn. The specification of the model can be one of the three exponential families: gaussian, binomial and poisson. The systematic component η\eta is E(y)E(y), the log odds of y=1y = 1, and the logarithm of E(y)E(y) respectively. The user is supposed to define at least one predictor in the formula, which could be a non-parametrically modelled variable or a parametrically modelled covariate (categorical or linear). Assume that η\eta is the systematic component and xx is a predictor, two symbolic routines shapes and in.or.out are used to include xx in the formula.

  • shapes(x): xx is included as a non-parametrically modelled predictor in the formula. See shapes for more details.

  • in.or.out(x): xx is included as a categorical or linear predictor in the formula. See in.or.out for more details.

family

A parameter indicating the error distribution and link function to be used in the model. It can be a character string naming a family function or the result of a call to a family function. This is borrowed from the glm routine in the stats package. There are three families used in ShapeSelect: gaussian, binomial and poisson.

cpar

A multiplier to estimate the model variance, which is defined as σ2=SSR/(nd0cparedf)\sigma^2 = SSR / (n - d_0 - cpar * edf). SSR is the sum of squared residuals for the full model, d0d_0 is the dimension of the linear space in the cone, and edf is the effective degrees of freedom. The default is cpar = 2. See Meyer, M. C. and M. Woodroofe (2000) for more details.

data

An optional data frame, list or environment containing the variables in the model. The default is data = NULL.

weights

An optional non-negative vector of "replicate weights" which has the same length as the response vector. If weights are not given, all weights are taken to equal 1. The default is weights = NULL.

npop

The population size used for the genetic algorithm. The default is npop = 200.

per.mutate

The percentage of mutation used for the genetic algorithm. The default is per.mutate = 0.05

genetic

A logical scalar showing if or not the genetic algorithm is defined by the user to select the best model. The default is genetic = FALSE.

Details

Note that when the argument genetic is set to be FALSE, the routine will check to see if using the genetic algorithm is better than going through all models to find the best fit. The primary concern is running time. An interactive dialogue window may pop out to ask the user if they prefer to the genetic algorithm when it may take too long to do a brutal search, and if there are too many possible models to go through, like more than one million, the routine will implement the genetic algorithm anyway.

See references cited in this section for more details.

Value

top

The best model of the final population, which shows the variables chosen along with their best shapes.

pop

The final population ordered by its fitness values. It is a data frame, and each row of this data frame shows the shapes chosen for predictors in an individual model. Besides, the fitness value for each individual model is included in the last column of the data frame. For example, we have two continuous predictors x1x_1, x2x_2, and a categorical predictor zz, then a row of this data frame may look like: "flat", "s.incr", "in", 12.3806-12.3806, which means that x1x_1 is not chosen, x2x_2 is chosen with the shape constraint to be smoothly increasing, zz is included in the model, and the fitness value for the model is 12.3806-12.3806.

fitness

The sorted fitness values for the final population.

tm

Total cpu running time.

xnms

A vector storing the name of the nonparametrically-modelled predictor in a ShapeSelect formula.

znms

A vector storing the name of the parametrically-modelled predictor in a ShapeSelect formula, which is a categorical predictor or a linear term.

trnms

A vector storing the name of the treatment predictor in a ShapeSelect formula, which has three possible levels: no effect, tree ordering, unordered.

zfacs

A logical vector keeping track of if the parametrically-modelled predictor in a ShapeSelect formula is a categorical predictor or a linear term.

mxf

A vector keeping track of the largest fitness value in each generation.

mnf

A vector keeping track of the mean fitness value in each generation.

GA

A logical scalar showing if or not the genetic algorithm is actually implemented to select the best model.

best.fit

The best model fitted by the cgam routine, given the best variables with their shape constraints chosen by the ShapeSelect routine.

call

The matched call.

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013a) Semi-parametric additive constrained regression. Journal of Nonparametric Statistics 25(3), 715

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

Meyer, M. C. and M. Woodroofe (2000) On the degrees of freedom in shape-restricted regression. Annals of Statistics 28, 1083–1104.

Meyer, M. C. (2008) Inference using shape-restricted regression splines. Annals of Applied Statistics 2(3), 1013–1033.

Meyer, M. C. and Liao, X. (2016) Variable and shape selection for the generalized additive model. under preparation

See Also

cgam

Examples

## Not run: 
# Example 1.
  library(MASS)
  data(Rubber)

  # ShapeSelect can be used to go through all models to find the best model
  fit <- ShapeSelect(loss ~ shapes(hard, set = "s.9") + shapes(tens, set = "s.9"),
  data = Rubber, genetic = FALSE)
 
  # the user can also choose to find the best model by the genetic algorithm
  # given any total number of possible models
  fit <- ShapeSelect(loss ~ shapes(hard, set = "s.9") + shapes(tens, set = "s.9"),
  data = Rubber, genetic = TRUE)

  # check the best model
  fit$top

  # check the running time
  fit$tm

# Example 2.
  # simulate a data set such that the mean is smoothly increasing-convex in x1 and x2
  n <- 100
  x1 <- runif(n)
  x2 <- runif(n)
  y0 <-  x1^2 + x2 + x2^3

  z <- rep(0:1, 50)
  for (i in 1:n) {
    if (z[i] == 1) 
      y0[i] = y0[i] * 1.5
  }

  # add some random errors and call the routine
  y <- y0 + rnorm(n)

  # include factor(z) in the formula and determine if factor(z) should be in the model 
  fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + in.or.out(factor(z)))
  
  # use the genetic algorithm
  fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + in.or.out(factor(z)),
   npop = 300, per.mutate=0.02)

  # include z as a linear term in the formula and determine if z should be in the model 
  fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + in.or.out(z))

  # include z as a linear term in the model 
  fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + z)

  # include factor(z) in the model 
  fit <- ShapeSelect(y ~ shapes(x1, set = "s.9") + shapes(x2, set = "s.9") + factor(z))

  # check the best model
  bf <- fit$best.fit 
 
  # make a 3D plot of the best fit
  plotpersp(bf, categ = "z")

## End(Not run)

Specify a Tree-Ordering in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta has a tree-ordering in a predictor in a formula argument to cgam.

Usage

tree(x, pl = NULL)

Arguments

x

A numeric vector which has the same length as the response vector. Note that the placebo level of x must be 0.

pl

The placebo level.

Details

"tree" returns the vector "x" and imposes on it two attributes: name and shape.

The name attribute is used in the subroutine plotpersp; the shape attribute is "tree", and according to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains that η\eta has a tree-ordering in "x" will be made. The cone edges are a set of basis employed in the hinge algorithm.

Note that "tree" does not make the corresponding cone edges itself. It sets things up to a sub-routine called tree.fun in cgam which will make the cone edges. A tree-ordering is a partial ordering: For a categorical variable xx, if there are treatment levels x1,,xkx_1,\ldots,x_k, where x1x_1 is a placebo, we compare xi,i=2,,kx_i, i = 2,\ldots,k with x1x_1, and not have any other comparable pairs.

See references cited in this section for more details.

Value

The vector x with two attributes, i.e., name: the name of x; shape: "tree".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

See Also

umbrella

Examples

# generate y
  set.seed(123)
  n <- 100 
  x <- rep(0:4, each = 20)
  z <- rep(c("a", "b"), 50)
  y <- x + I(z == "a") + rnorm(n, 1)
  xu <- unique(x)

  # regress y on x under the tree-ordering restriction
  fit.tree <- cgam(y ~ tree(x) + factor(z)) 

  # make a plot
  plot(x, y, cex = .7)
  mua = unique(fit.tree$muhat)[unique(z) == "a"]
  points(xu, unique(fit.tree$muhat)[unique(z) == "a"], pch = '+', col = 4, cex = 3)
  legend(0,7.5, bty = "n", "tree-ordering fit: z = 'a'", col = 4, pch = '+', cex = 1.3)
  mub = unique(fit.tree$muhat)[unique(z) == "b"] 
  points(xu, unique(fit.tree$muhat)[unique(z) == "b"], pch = '+', col = 2, cex = 3)
  legend(0,8.5, bty = "n", "tree-ordering fit: z = 'b'", col = 2, pch = '+', cex = 1.3)

Specify an Umbrella-Ordering in a CGAM Formula

Description

A symbolic routine to define that the systematic component η\eta has an umbrella-ordering in a predictor in a formula argument to cgam.

Usage

umbrella(x)

Arguments

x

A numeric vector which has the same length as the response vector.

Details

"umbrella" returns the vector "x" and imposes on it two attributes: name and shape.

The name attribute is used in the subroutine plotpersp; the shape attribute is "umbrella", and to the value of the vector itself and its shape attribute, the cone edges of the cone generated by the constraint matrix, which constrains that η\eta has an umbrella-ordering in "x" will be made. The cone edges are a set of basis employed in the hinge algorithm.

Note that "umbrella" does not make the corresponding cone edges itself. It sets things up to a sub-routine called umbrella.fun in cgam which will make the cone edges. An umbrella-ordering is a partial ordering: Suppose we have a x0x_0 that is known to be a "mode" so that for x,y>=x0x, y >= x_0, we have a binary relation between xx and yy if x<=yx <= y and for x,y<=x0x, y <= x_0 we have the opposite binary relation if x<=yx <= y, but if x<x0x < x_0 and y>x0y > x_0, there is no such binary relation.

See references cited in this section for more details.

Value

The vector x with two attributes, i.e., name: the name of x; shape: "umbrella".

Author(s)

Mary C. Meyer and Xiyue Liao

References

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126–1139.

See Also

tree

Examples

# generate y
  set.seed(123)
  n <- 20
  x <- seq(-2, 2, length = n)
  y <- - x^2 + rnorm(n)

  # regress y on x under the umbrella-ordering restriction
  fit <- cgam(y ~ umbrella(x)) 

  # make a plot
  par(mar = c(4, 4, 1, 1))
  plot(x, y, cex = .7, ylab = "y")
  lines(x, fit$muhat, col = 2)
  legend("topleft", bty = "n", "umbrella-ordering fit", col = 2, lty = 1)