Package 'simplexreg'

Title: Regression Analysis of Proportional Data Using Simplex Distribution
Description: Simplex density, distribution, quantile functions as well as random variable generation of the simplex distribution are given. Regression analysis of proportional data using various kinds of simplex models is available. In addition, GEE method can be applied to longitudinal data to model the correlation. Residual analysis is also involved. Some subroutines are written in C with GNU Scientific Library (GSL) so as to facilitate the computation.
Authors: Peng Zhang, Zhengguo Qiu and Chengchun Shi
Maintainer: Peng Zhang <[email protected]>
License: GPL-2
Version: 1.3
Built: 2024-12-05 06:57:46 UTC
Source: CRAN

Help Index


Plots for simplexreg Objects

Description

Various types of plots could be produced for simplexreg Objects, including plots of correlation structure, plots of different types of residuals and plots of partial deviance.

Usage

## S3 method for class 'simplexreg'
plot(x, type = c("residuals", "corr", "GOF"), res = "adjvar", lag = 1, ...)

Arguments

x

fitted model object of class "simplexreg"

type

character specifying types of plots: the correlation (corr), residuals (residuals), partial deviances (GOF). See 'Details'

res

character specifying types of residuals:approximate Pearson residual (appstdPerr), standard Pearson residual (stdPerr), adjusted dependent variable sis_i (adjvar). See residuals.simplexreg

lag

when type = corr, this function examine the autocorrelation at lag lag

...

other parameters to be passed through to the plot function

Details

This function provides graphical presentations for simplexreg objects. The plot of correlation aims examine the correlation structure of the longitudinal data set. Let rijr_{ij} be the standardised score residuals of the ith observation at time tijt_{ij}, and lag = k, then rijr_{ij} are plotted against rikr_{ik} for all ii and j<kj < k, if tijtik=k|t_{ij} - t_{ik}| = k.

Residuals can be plotted when specifying type = "residuals", The upper and lower 95 (1.96) are also lined.

Plots of partial deviance are for the goodness-of-fit test in the presence of within-subject dependence for longitudinal data. The partial deviances are defined as

DjP=i=1mjd(yijμ^ij)/σij2,jTD_j^P=\sum_{i=1}^{m_j}d(y_{ij}-\hat{\mu}_{ij}) / \sigma_{ij}^2, j \in T

where T denotes a collection of all distinct times on which observation are made. Cross-sectionally, yijy_{ij}'s are independent and hence DjPD_j^P follows approximately χ2\chi^2, with mjm_j being the total number of yijy_{ij}'s observed cross-sectionally at time tjt_j. Both observed partial deviance DjPD_j^P statistics and the corresponding critical values are depicted and compared at each time point.

Author(s)

Chengchun Shi

References

Song, P. and Qiu, Z. and Tan, M. (2004) Modelling Heterogeneous Dispersion in Marginal Models for Longitudinal Proportional Data. Biometrical Journal, 46: 540–553

Qiu Z. (2001) Simplex Mixed Models for Longitudinal Proportional Data. Ph.D. Dissertation, York University

Zhang, P. and Qiu, Z. and Shi, C. (2016) simplexreg: An R Package for Regression Analysis of Proportional Data Using the Simplex Distribution. Journal of Statistical Software, 71: 1–21

See Also

summary.simplexreg, residuals.simplexreg

Examples

## fit the model
data("sdac", package="simplexreg")
sim.glm2 <- simplexreg(rcd~ageadj+chemo|age, 
  link = "logit", data = sdac)
	
data("retinal", package = "simplexreg")
sim.gee2 <- simplexreg(Gas~LogT+LogT2+Level|LogT+Level|Time, 
  link = "logit", corr = "AR1", id = ID, data = retinal)  

## produce the plots
plot(sim.glm2, type = "residuals", res = "stdPerr", ylim = c(-3, 3))
plot(sim.gee2, type = "corr", xlab = "", ylab = "")
plot(sim.gee2, type = "GOF", xlab = "", ylab = "")

Prediction Method for simplexreg Objects

Description

Predicted values based on simplex regression object

Usage

## S3 method for class 'simplexreg'
predict(object, newdata = NULL, type = c("response", "dispersion"), 
   	na.action, ...)

Arguments

object

fitted model object of class "simplexreg"

newdata

an optional data frame in which to look for variables with which to predict. If omitted, original observations are used.

type

character indicating type of predictions:fitted mean of response ("response") or fitted dispersion parameter("dispersion")

na.action

function determining what should be done with missing values in newdata

...

currently not used

Author(s)

Chengchun Shi

See Also

simplexreg

Examples

## fit the model
data("sdac", package="simplexreg")
sim.glm2 <- simplexreg(rcd~ageadj+chemo|age, link = "logit", 
  data = sdac)

data("retinal", package = "simplexreg")
sim.gee2 <- simplexreg(Gas~LogT+LogT2+Level|LogT+Level|Time,
  link = "logit", corr = "AR1", id = ID, data = retinal)  	

## predict the mean
predict(sim.glm2, type = "response")

## predict the dispersion
predict(sim.gee2, type = "dispersion")

Extract residuals for simplexreg Objects

Description

Methods for extracting various types of residuals from simplex regression, from approximate Pearson residuals, standard Pearson residuals and standardise score residuals to adjusted dependent variable suggested by McCullagh and Nelder (1989). The first three can be used to examine mean-variance relation while the last aims to test the link function.

Usage

## S3 method for class 'simplexreg'
residuals(object, type = c("appstdPerr", "stdPerr", "stdscor", "adjvar"), 
   	...)

Arguments

object

fitted model object of class "simplexreg"

type

character specifying types of residuals:approximate Pearson residual (appstdPerr), standard Pearson residual (stdPerr), adjusted dependent variable sis_i (adjvar). Details are given in 'Details'

...

currently not used

Details

The Pearson residual takes the form

riP=yiμ^iτ^ir_i^P=\frac{y_i-\hat{\mu}_i}{\hat{\tau}_i}

where μ^i\hat{\mu}_i is the fitted mean parameter and details about calculation of τ\tau is given in Jorgensen (1997). When the dispersion parameter σ2\sigma^2 (see simplex) is large the variance of response approaches to μ(1μ)\mu(1-\mu) and this leads to the approximate Pearson residual

ria=yiμ^iμ^i(1μ^i)r_i^a=\frac{y_i-\hat{\mu}_i}{\sqrt{\hat{\mu}_i(1-\hat{\mu}_i)}}

Plot of the standardised score residuals,

riS=uivar(ui)r_i^S=\frac{u_i}{\sqrt{var(u_i)}}

where uiu_i is the score function, can also detect model assumption violation. Details can be found in Song et al. (2004). The adjusted dependent variable suggested by McCullagh and Nelder (1989) could be employed as an informal check for the link function,

si=g(μi)+(3σ4μi(1μi)+σ2V(μi))1/2u(yi;μi)s_i = g(\mu_i) + (\frac{3\sigma^4}{\mu_i(1-\mu_i)}+\frac{\sigma^2}{V(\mu_i)})^{-1/2} u(y_i;\mu_i)

where u(yi;μi)u(y_i;\mu_i) and V(μi)V(\mu_i) are the score function and variance function.

Author(s)

Chengchun Shi

References

Barndorff-Nielsen, O.E. and Jorgensen, B. (1991) Some parametric models on the simplex. Journal of Multivariate Analysis, 39: 106–116

Jorgensen, B. (1997) The Theory of Dispersion Models. London: Chapman and Hall

McCullagh, P and Nelder J. (1989) Generalized Linear Models. London: Chapman and Hall

Song, P. and Qiu, Z. and Tan, M. (2004) Modelling Heterogeneous Dispersion in Marginal Models for Longitudinal Proportional Data. Biometrical Journal, 46: 540–553

Zhang, P. and Qiu, Z. and Shi, C. (2016) simplexreg: An R Package for Regression Analysis of Proportional Data Using the Simplex Distribution. Journal of Statistical Software, 71: 1–21

See Also

summary.simplexreg, simplex

Examples

## fit the model
data("sdac", package="simplexreg")
sim.glm2 <- simplexreg(rcd~ageadj+chemo|age,
  link = "logit", data = sdac)
	
data("retinal", package = "simplexreg")
sim.gee2 <- simplexreg(Gas~LogT+LogT2+Level|LogT+Level|Time,
  link = "logit", corr = "AR1", id = ID, data = retinal)  

## extract the residuals
res <- residuals(sim.glm2, type = "stdPerr")
res <- residuals(sim.gee2, type = "adjvar")

Data on recorded decay of intraocular gas in complex retinal surgeries

Description

The study recorded the decay of intraocular gas C3F8C_3F_8 in complex retinal surgeries following initial injection in an ophthalmology study, reported in Meyers et al. (1992). The outcome variable was the percent of gas left in the eye. The gas, with three different concentration levels, 15%, 20% and 25% was injected into the eye before surgery for 31 patients. They were then followed three to eight (average of five) times over a three-month period, and the volume of gas in the eye at the follow-up times were recorded as a percentage of the initial gas volume. The primary interest was to investigate whether concentration levels of the gas injected in patients' eyes affect the decay rate of the gas.

Usage

data("retinal")

Format

A data frame with 181 observations on the following 6 variables.

Gas

Percentage of the initial gas volume left

Time

Time covariate of days after the gas injection

LogT

Logarithm of Time

LogT2

Square of LogT

Level

Concentration levels of the initial intraocular gas. For each patient, Level = -1 if the gas concentration level is 15%, Level = 0 if 20%, or 1 if 25%.

ID

A factor indicating patients.

References

Meyers, S. M. and Ambler, J. S. and Tan, M. (1992) Variation of Perfluorpropane Disappearance after Vitrectomy. Retinal, 12: 359–363

Zhang, P. and Qiu, Z. and Shi, C. (2016) simplexreg: An R Package for Regression Analysis of Proportional Data Using the Simplex Distribution. Journal of Statistical Software, 71: 1–21


Data on Autologous Peripheral Blood Stem Cell Transplants in Alberta Health Service

Description

Autologous peripheral blood stem cell (PBSC) transplants have been widely used for rapid hematologic recovery following myeloablative therapy for various malignant hematological disorders. A study enrolled 242 patients who consented to autologous PBSC transplant after myeloablative doses of chemotherapy between year 2003 and 2008 at the Edmonton Hematopoietic Stem Cell Lab in Cross Cancer Institute - Alberta Health Services. The Data is a data frame containing information about the patients' age, gender, as well as their clinical characteristics.

Usage

data("sdac")

Format

A data frame with 239 observations on the following 5 variables.

age

patients' ages

gender

patients' genders

rcd

recovery rates for viable CD34+ cells

chemo

dummy variable indicating if a patient receives a chemotherapy on a one-day protocol(0) or on a 3-day protocol(1)

ageadj

adjusted age variable. age < 40 is set as the baseline age and other ages are adjusted by subtracting by 40

References

Allan, D. and Keeney, M. and Howson-Jan, K. and Popma, J. and Weir, K. and Bhatia, M. and Sutherland, D. and Chin_yee, I. (2002) Number of Viable CD34+ Cells Reinfused Predicts Engraftment in Autologous Hematopoietic Stem Cell Transplantation. BONE MARROW TRANSPL, 20: 967-72

Yang, H. and Acker, J. and Cabuhat, M. and Letcher, B. and Larratt, L. and McGann, L. (2005) Association of Post_Thaw viable CD34+ Cells and CFU-GM with Time to Hematopoietic Engraftment. BONE MARROW TRANSPL, 35: 881-887

Zhang, P. and Qiu, Z. and Shi, C. (2016) simplexreg: An R Package for Regression Analysis of Proportional Data Using the Simplex Distribution. Journal of Statistical Software, 71: 1–21


The Simplex Distribution Functions

Description

Density, cumulative distribution function, quantile function and random variable generation for the simplex distribution with mean equal to mu and dispersion equal to sig

Usage

dsimplex(x, mu, sig)
psimplex(q, mu, sig)
qsimplex(p, mu, sig)
rsimplex(n, mu, sig)
psimplex.norm(q, mu, sig)
qsimplex.norm(p, mu, sig)

Arguments

x, q

vector of quantiles

p

vector of probabilities

n

number of observations

mu

vector of means

sig

vector of square root of dispersion parameter of simplex distribution

Details

The simplex distribution has density

p(y)=(2πσ2)12(y(1y))32exp(12σ2d(y;μ)),y(0,1)p(y) = (2\pi \sigma^2)^{-\frac{1}{2}} (y(1-y))^{-\frac{3}{2}} \exp(-\frac{1}{2\sigma^2} d(y;\mu)), y \in (0, 1)

where d(y;μ)d(y;\mu) is a unit deviance function

d(y;μ)=(yμ)2y(1y)μ2(1μ)2d(y;\mu) = \frac{(y-\mu)^2}{y(1-y) \mu^2 (1-\mu)^2}

μ\mu is the mean of simplex distribution and σ2\sigma^2 the dispersion parameter. qnorm provides results up to about 6 digits.

Value

dsimplex gives density function, psimplex gives the distribution function, qsimplex gives quantile function and rsimplex gives random number generated from the simplex distribution. psim.norm and qsimplex.norm gives the renormalized distribution and quantile function.

Author(s)

Peng Zhang and Zhenguo Qiu

References

Barndorff-Nielsen, O.E. and Jorgensen, B. (1991) Some parametric models on the simplex. Journal of Multivariate Analysis, 39: 106–116

Jorgensen, B. (1997) The Theory of Dispersion Models. London: Chapman and Hall

Song, P. and Qiu, Z. and Tan, M. (2004) Modelling Heterogeneous Dispersion in Marginal Models for Longitudinal Proportional Data. Biometrical Journal, 46: 540–553

Examples

# simplex distribution function
dsimplex(seq(0.01,0.99,0.01), 0.5, 1)
psimplex(seq(0.01,0.99,0.01), 0.5, 1)
qsimplex(seq(0.01,0.99,0.01), 0.5, 1)

# random variable generation
n <- 200
ga0 <- 1.5
ga1 <- 0.5
ga2 <- -0.5
sigma <- 4
T <- c(rep(0, n/2), rep(1, n/2))
S <- runif(n, 0, 5)
eta <- ga0 + ga1 * T + ga2 * S
mu <- exp(eta)/(1+exp(eta))
Y <- rep(0, n)
for (i in 1:n){ 
  Y[i] <- rsimplex(1, mu[i], sigma)
}

Simplex Generalized Linear Model Regression Function

Description

Regression Analysis of Proportional Data Using Various Types of Simplex Models

Usage

simplexreg(formula, data, subset, na.action, 
   	link = c("logit", "probit", "cloglog", "neglog"), corr = "Ind", id = NULL, 
   	control = simplexreg.control(...), model = TRUE, y = TRUE, x = TRUE, ...)
	
simplexreg.fit(y, x, z = NULL, t = NULL, link = "logit", corr = "Ind",
   	id = NULL, control = simplexreg.control())

Arguments

formula

a symbolic description of the model to be fitted(of type y ~ x or y ~ x | z | t. The Details are given under 'Details').

data

an optional data frame, list or environment containing variables in formula and id.

subset, na.action

arguments controlling formula processing via model.frame

link

type of link function to the mean. Currently, "logit"(logit function), "probit"(probit function), "cloglog"(complementary log-log function), "neglog"(negative log function) are supported.

corr

the covariance structure, chosen from "Ind"(independent structure), "Exc"(exchangeability) and "AR1"(AR(1)), see Details

id

a factor identifies the clusters when gee = TRUE. The length of id should be the same as the number of observations. y, x, z, t are assumed to be sorted in accordance with clusters specified by id

control

a list of control argument via simplexreg.control

model

a logical value indicating whether model frame should be included as a component of the return value

y, x

For simplexreg:logical values indicating whether response vector and covariates modelling the mean parameter should be returned as components of the returned value For simplexreg.fit:x is the design matrix and y is the response vector

z

regressor matrix modelling the dispersion parameter

t

time covariate in the correlation structure, see Details

...

argument passed to simplexreg.control

Details

Outcomes of continuous proportions arise in many applied areas. Such data could be properly modelled using simplex regression. See also simplex. The mean and dispersion parameters are linked to set of regressors. Regression analysis of the simplex model is implemented in simplexreg. If corr = "Ind", simplex generalized regression model is employed. Estimations is performed by maximum likelihood via Fisher scoring technique.

Apart from including generalized simplex regression models, this function also provides users with generalized estimating equations (GEE) techniques to model longitudinal proportional response. Exchangeability and AR(1) structures are available. Parameter estimation and residual analysis are involved.

We employ the specification approach designed in the fitting model function betareg of beta regression in the package betareg. As for simplex regression models, assuming the dispersion is homogeneous, the response is linked to a linear predictor described by y ~ x1 + x2 using a link function. Four types of function are available linking the regressors to the mean. However, for dispersion, the link function is restricted to logarithm function. When modeling dispersion, the regressor modelling the dispersion parameter should be specified in a formula form of type y ~ x1 + x2 | z1 + z2 where z1 and z2 are linked to the dispersion parameter σ2\sigma^2.

Model specification is a bit complicated when it comes to modelling longitudinal proportional response. Song et. al (2004) proposed a marginal simplex model consists of three components, the population-average effects, the pattern of dispersion and the correlation structure. Let the percentage responses for the iith subject be yijy_{ij}, observed at time tijt_{ij}. If corr = "AR1", the working covariance matrix of yijy_{ij}, j=1,2,...,nij = 1, 2, ..., n_i, is

exp(αtiktij)kj{\exp(\alpha * |t_{ik} - t_{ij}|)}_{kj}

where α<0\alpha < 0 and exp(α)\exp(\alpha) is the lag-1 autocorrelation. If corr = "Exc", the covariance matrix will be (1exp(α))I+exp(α)1(1 - exp(\alpha)) I + exp(\alpha) 1 where I is the identity matrix while 1 the matrix with all elements being equal to one.

For homogeneous dispersion, the formula is supposed to be of the form y ~ x1 + x2 | 1 | t where tt is the time covariate. Otherwise, the formula will be of the form y ~ x1 + x2 | z1 + z2 | t.

Value

fixef

estimates of coefficients modelling the mean as well as the standard deviation

dispar

estimates of coefficients modelling dispersion as well as the standard deviation

Dispersion

estimate of the dispersion parameter

appstdPerr

approximated standard deviations of the regression coefficients

stdPerr

exact standard deviations of the regression coefficients

meanmu

estimate of mean parameter

adjvar

adjusted dependent variable sis_i. Details could be found in McCullagh and Nelder (1989)

stdscor

standardised score residuals. Details can be found in Song et al. (2004)

predict

predicted values of g(μi)g(\mu_i) where g is the link function and μi\mu_i the mean parameter

loglike

value of maximum log-likelihood function

deviance

value of deviance

call

the original function call

formula

the original formula

terms

a list with elements "mean" and "dispersion" containing term object for the model

levels

a list with elements "mean" and "dispersion" containing levels of categorical regressors

link

type of function linking to the mean

type

type = "homo" for homogeneous dispersion while type = "hetero" for heterogeneous dispersion

model

the full model frame (if model = TRUE)

y

response vector (if y = TRUE)

x

a list with elements mean, dispersion, time and id containing corresponding variables

n

number of proportional observations

iter

number of Fisher iterations

...

argument passed to simplexreg.control

Author(s)

Zhenguo Qiu, Peng Zhang and Chengchun Shi

References

Barndorff-Nielsen, O.E. and Jorgensen, B. (1991) Some parametric models on the simplex. Journal of Multivariate Analysis, 39: 106–116

Jorgensen, B. (1997) The Theory of Dispersion Models. London: Chapman and Hall

McCullagh, P and Nelder J. (1989) Generalized Linear Models. London: Chapman and Hall

Song, P. and Qiu, Z. and Tan, M. (2004) Modelling Heterogeneous Dispersion in Marginal Models for Longitudinal Proportional Data. Biometrical Journal, 46: 540–553

Zhang, P. and Qiu, Z. and Shi, C. (2016) simplexreg: An R Package for Regression Analysis of Proportional Data Using the Simplex Distribution. Journal of Statistical Software, 71: 1–21

See Also

simplex

Examples

# GLM models
data("sdac", package = "simplexreg")
sim.glm1 <- simplexreg(rcd~ageadj+chemo, link = "logit", 
  data = sdac)
sim.glm2 <- simplexreg(rcd~ageadj+chemo|age, link = "logit", 
  data = sdac)

# GEE models
data("retinal", package = "simplexreg")
sim.gee1 <- simplexreg(Gas~LogT+LogT2+Level|1|Time, link = "logit", 
  corr = "Exc", id = ID, data = retinal)
sim.gee2 <- simplexreg(Gas~LogT+LogT2+Level|LogT+Level|Time, 
  link = "logit", corr = "AR1", id = ID, data = retinal)

Control Parameters for Simplex Regression

Description

Various parameters that control fitting of simplex regression models using simplexreg.

Usage

simplexreg.control(maxit = 200, beta = NULL, gamma = NULL, alpha = NULL, 
   	tol = 1e-6, ...)

Arguments

maxit

maximum number of iterations

beta

start value for beta modelling the mean parameter

gamma

start value for gamma modelling the dispersion

alpha

start value for alpha modelling correlation structure using GEEs, see Song et.al (2004)

tol

numeric tolerance for convergence in Fisher scoring

...

currently not used

Value

A list with the arguments specified.

See Also

simplexreg

Examples

# GLM models
data("sdac", package = "simplexreg")
sim.glm1 <- simplexreg(rcd~ageadj+chemo, link = "logit", 
  data = sdac, beta = c(1.115, 0.013, 0.252))
sim.glm2 <- simplexreg(rcd~ageadj+chemo|age, link = "logit", 
  data = sdac, beta = c(1.115, 0.013, 0.252), gamma = c(2.61, -0.015))

# GEE models
data("retinal", package = "simplexreg")
sim.gee1 <- simplexreg(Gas~LogT+LogT2+Level|1|Time, link = "logit", 
  corr = "Exc", id = ID, data = retinal, beta = c(2.72, 0.034, -0.329, 0.409), 
  alpha = -0.3)
sim.gee2 <- simplexreg(Gas~LogT+LogT2+Level|LogT+Level|Time, 
  link = "logit", corr = "AR1", id = ID, data = retinal, alpha = -0.3,
  beta = c(2.72, 0.034, -0.329, 0.409))

Extracting Information from Objects simplexreg

Description

Methods for extracting information from fitted simplex regression model objects of class "simplexreg"

Usage

## S3 method for class 'simplexreg'
## S3 method for class 'simplexreg'
summary(object, type = "stdPerr", ...)

## S3 method for class 'simplexreg'
## S3 method for class 'simplexreg'
coef(object, ...)

## S3 method for class 'simplexreg'
## S3 method for class 'simplexreg'
vcov(object, ...)

Arguments

object

fitted model object of class "simplexreg"

type

character specifying type of residuals to be included, see residuals.simplexreg

...

currently not used

Details

These functions make it possible to extract information from objects of class "simplexreg". Wald statistics as well as the p-values of regression coefficients are given in the summary output. If GEE = FALSE, based on the fitted coefficients, a χ2\chi^2 test is performed and the p-value is reported in the output. Otherwise, coefficients of the autocorrelation α\alpha, ρ\rho, (see Song et. al (2004)), are also involved.

Model coefficients and their covariance matrix could be extracted by the coef, and vcov, respectively. For simplex GLM models (GEE = FALSE), Akaike Information Criterion and Bayesian Information Criterion could be calculated using generic functions AIC and BIC, respectively.

Author(s)

Chengchun Shi

References

Barndorff-Nielsen, O.E. and Jorgensen, B. (1991) Some parametric models on the simplex. Journal of Multivariate Analysis, 39: 106–116

Jorgensen, B. (1997) The Theory of Dispersion Models. London: Chapman and Hall

Song, P. and Qiu, Z. and Tan, M. (2004) Modelling Heterogeneous Dispersion in Marginal Models for Longitudinal Proportional Data. Biometrical Journal, 46: 540–553

Zhang, P. and Qiu, Z. and Shi, C. (2016) simplexreg: An R Package for Regression Analysis of Proportional Data Using the Simplex Distribution. Journal of Statistical Software, 71: 1–21

See Also

simplexreg, residuals.simplexreg

Examples

## fit the model
data("sdac", package = "simplexreg")
sim.glm2 <- simplexreg(rcd~ageadj+chemo|age, link = "logit", 
  data = sdac)

data("retinal", package = "simplexreg")
sim.gee2 <- simplexreg(Gas~LogT+LogT2+Level|LogT+Level|Time,
  link = "logit", corr = "AR1", id = ID, data = retinal)  
  
## extract information
summary(sim.glm2, type = "appstdPerr")
coef(sim.glm2)
vcov(sim.glm2)
AIC(sim.glm2)
BIC(sim.glm2)

summary(sim.gee2, type = "stdscor")
coef(sim.gee2)
vcov(sim.glm2)