Package 'MAVE' reference manual

Title:	Methods for Dimension Reduction
Description:	Functions for dimension reduction, using MAVE (Minimum Average Variance Estimation), OPG (Outer Product of Gradient) and KSIR (sliced inverse regression of kernel version). Methods for selecting the best dimension are also included. Xia (2002) <doi:10.1111/1467-9868.03411>; Xia (2007) <doi:10.1214/009053607000000352>; Wang (2008) <doi:10.1198/016214508000000418>.
Authors:	Hang Weiqiang<E0010758@u.nus.edu>, Xia Yingcun<staxyc@nus.edu.sg>
Maintainer:	Hang Weiqiang<E0010758@u.nus.edu>
License:	GPL (>= 2)
Version:	1.3.11
Built:	2025-02-18 07:41:04 UTC
Source:	CRAN

Directions of CS or CMS of given dimension

Description

This function returns the basis matrix of CS or CMS of given dimension

Usage

## S3 method for class 'mave'
coef(object, dim, ...)

## S3 method for class 'mave.dim'
coef(object, dim = "dim.min", ...)
## S3 method for class 'mave'
coef(object, dim, ...)

## S3 method for class 'mave.dim'
coef(object, dim = "dim.min", ...)

Arguments

`object`	the output of `mave` or the output of `mave.dim`
`dim`	the dimension of CS or CMS. The value of dim should be given when the class of the argument dr is mave. When the class of the argument dr is mave.dim and dim is not given, the function will return the basis matrix of CS or CMS of dimension selected by `mave.dim`. Note that the dimension should be > 0.
`...`	no use.

Value

dir the matrix of CS or CMS of given dimension

Examples

x <- matrix(rnorm(400),100,4)
y <- x[,1]+x[,2]+as.matrix(rnorm(100))
dr <- mave(y~x)
dir3 <- coef(dr,3)

dr.dim <- mave.dim(dr)
dir3 <- coef(dr.dim,3)
dir.best <- coef(dr.dim)

x <- matrix(rnorm(400),100,4)
y <- x[,1]+x[,2]+as.matrix(rnorm(100))
dr <- mave(y~x)
dir3 <- coef(dr,3)

dr.dim <- mave.dim(dr)
dir3 <- coef(dr.dim,3)
dir.best <- coef(dr.dim)

Concrete Compressive Strength Data Set

Description

Concrete strength is very important in civil engineering and is a highly nonlinear function of age and ingredients. This dataset contains 1030 instances and there are 8 features relevant to concrete strength. The description of the varaibles are given below. The description is from https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength. Name – Data Type – Measurement – Description

Format

A data frame with 1030 rows and 8 covariate variables and 1 response variable

Details

Cement (component 1) – quantitative – kg in a m3 mixture – Input Variable

Blast Furnace Slag (component 2) – quantitative – kg in a m3 mixture – Input Variable

Fly Ash (component 3) – quantitative – kg in a m3 mixture – Input Variable

Water (component 4) – quantitative – kg in a m3 mixture – Input Variable

Superplasticizer (component 5) – quantitative – kg in a m3 mixture – Input Variable

Coarse Aggregate (component 6) – quantitative – kg in a m3 mixture – Input Variable

Fine Aggregate (component 7) – quantitative – kg in a m3 mixture – Input Variable

Age – quantitative – Day (1~365) – Input Variable

Concrete compressive strength – quantitative – MPa – Output Variable

Source

https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength

References

-Cheng Yeh, "Modeling of strength of high performance concrete using artificial neural networks," Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998).

Examples

data(Concrete)
train = sample(1:1030)[1:500]
x.train = as.matrix(Concrete[train,1:8])
y.train = as.matrix(Concrete[train,9])
x.test  = as.matrix(Concrete[-train,1:8])
y.test  = as.matrix(Concrete[-train,9])

dr = mave.compute(x.train,y.train, method='meanopg',max.dim=8)
dr.dim = mave.dim(dr)
y.pred = predict(dr.dim,x.test)
#estimation error
mean((y.pred-y.test)^2)

data(Concrete)
train = sample(1:1030)[1:500]
x.train = as.matrix(Concrete[train,1:8])
y.train = as.matrix(Concrete[train,9])
x.test  = as.matrix(Concrete[-train,1:8])
y.test  = as.matrix(Concrete[-train,9])

dr = mave.compute(x.train,y.train, method='meanopg',max.dim=8)
dr.dim = mave.dim(dr)
y.pred = predict(dr.dim,x.test)
#estimation error
mean((y.pred-y.test)^2)

House price in King County, USA

Description

A data set contains 21613 obervations with 19 features plus house price. The names of the columns are given below.

id
date: Date house was sold(String)
price: Price of the sold house
bedrooms: Numer of Bedrooms
bathrooms: Numer of bathrooms
sqft_living: Square footage of the living room
sqrt_log: Square footage of the log
floors: Total floors in the house
waterfront: Whether the house has a view a waterfront(1: yes, 0: not)
view: unknown
condtion: Condition of the house
grade: unknown
sqft_above: Square footage of house apart from basement
sqft_basement: Square footage of the basement
yr_built: Built year
yr_renovated: Year when the house was renovated
zipcode: zipcode of the house
lat: Latitude coordinate
long Longitude coordinate
sqft_living15: Living room area in 2015(implies some renovations)
sqrt_lot15: Lot area in 2015(implies some renovations)

Format

A data frame with 21613 rows and 19 variables

Source

https://www.kaggle.com/harlfoxem/housesalesprediction

Examples

data(kc_house_data)
#convert date in string to date in numeric value
kc_house_data[,2]=sapply(kc_house_data[,2],as.double)
train = sample(1:21613)[1:1000]
x.train = as.matrix(kc_house_data[train,c(2,4:21)]) #exclude id, house price
y.train = as.matrix(kc_house_data[train,3]) # house price
x.test  = as.matrix(kc_house_data[-train,c(2,4:21)])
y.test  = as.matrix(kc_house_data[-train,3])
data(kc_house_data)
#convert date in string to date in numeric value
kc_house_data[,2]=sapply(kc_house_data[,2],as.double)
train = sample(1:21613)[1:1000]
x.train = as.matrix(kc_house_data[train,c(2,4:21)]) #exclude id, house price
y.train = as.matrix(kc_house_data[train,3]) # house price
x.test  = as.matrix(kc_house_data[-train,c(2,4:21)])
y.test  = as.matrix(kc_house_data[-train,3])

Dimension reduction

Description

This function provides several methods to estimate the central space or central mean space of y on x. It returns the matrix of central space or central mean space for different dimensions and contains other information used for dimension selection by mave.dim.

Usage

mave(
  formula,
  data,
  method = "CSOPG",
  max.dim = 10,
  screen = NULL,
  subset,
  na.action = na.fail
)

mave.compute(
  x,
  y,
  method = "CSOPG",
  max.dim = 10,
  screen = nrow(x)/log(nrow(x))
)
mave(
  formula,
  data,
  method = "CSOPG",
  max.dim = 10,
  screen = NULL,
  subset,
  na.action = na.fail
)

mave.compute(
  x,
  y,
  method = "CSOPG",
  max.dim = 10,
  screen = nrow(x)/log(nrow(x))
)

Arguments

`formula`	the model used in regression
`data`	the data
`method`	This parameter specify which method will be used in dimension reduction. It provides five methods, including "csMAVE","csOPG","meanOPG","meanMAVE","KSIR" by default, method = 'csOPG' 'meanOPG' and 'meanMAVE' estimate dimension reduction space for conditional mean 'csMAVE' and 'csOPG' estimate the central dimension reduction space 'KSIR' is a kernel version of sliced inverse regression (Li, 1991). It is fast, but with poor accuracy.
`max.dim`	the maximum dimension of dimension reduction space. The default is 10. In practice, max.dim will be equal to min(max.dim,ncol(x),screen).
`screen`	specify the number of variables retained after screening method. The default is n/log(n). When this number is smaller than max.dim, then max.dim will change to the value of screen
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`na.action`	a function which indicates what should happen when the data contain NAs. The default is na.action, which wil stop calculations. If na.action is set to be na.omit, the incomplete cases will be removed.
`x`	The n by p design matrix.
`y`	The n by q respond matrix.

Value

dr is a list which contains:

dir: dir[[d]] is the central space with d-dimension d = 1, 2, ..., p reduced direction of different dimensions
y: the value of response
idx: the index of variables which survives after screening
max.dim: the largest dimensions of CS or CMS which have been calculated in mave function
ky: parameter used for DIM for selection
x: the original training data

References

Li K C. Sliced inverse regression for dimension reduction[J]. Journal of the American Statistical Association, 1991, 86(414): 316-327.

Xia Y, Tong H, Li W K, et al. An adaptive estimation of dimension reduction space[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2002, 64(3): 363-410.

Xia Y. A constructive approach to the estimation of dimension reduction directions[J]. The Annals of Statistics, 2007: 2654-2690.

Wang H, Xia Y. Sliced regression for dimension reduction[J]. Journal of the American Statistical Association, 2008, 103(482): 811-821.

Examples

 x <- matrix(rnorm(400*5),400,5)
 b1 <- matrix(c(1,1,0,0,0),5,1)
 b2 <- matrix(c(0,0,1,1,0),5,1)
 eps <- matrix(rnorm(400),400,1)
 y <- x%*%b1 + (x%*%b2)*eps

 #finding central space based on OPG method
 #dr.csopg <- mave.compute(x,y, method = 'csopg')
 #or
 dr.csopg <- mave(y ~ x, method = 'csopg')

 #dr.meanopg <- mave.compute(x,y, method = 'meanopg')
 #or
 dr.meanopg <- mave(y ~ x, method = 'meanopg')

 #find central mean space based on ksir method
 dr.ksir <- mave(y~x,method='ksir')
 #or
 #dr.ksir <- mave.compute(x,y,method='ksir')

 #See more examples about screening and mutiple responses in the vignette
 #Using screening for high dimensional data
 #x <- matrix(rnorm(100*50),100,50)
 #y1 = as.matrix(x[,1])+rnorm(100)*.2
 #y2 = as.matrix(x[,2]+x[,3])*as.matrix(x[,1]+x[,5])+rnorm(100)*.2
 #y = cbind(y1,y2)
 #dr.sc = mave(y~x,method='CSOPG',max.dim=5,screen=20)
 #dr.sc.dim = mave.dim(dr.sc)
 #print the directions of central space with the selected variables
 #dr.sc.dim$dir[[3]][dr.sc$idx,]

x <- matrix(rnorm(400*5),400,5)
 b1 <- matrix(c(1,1,0,0,0),5,1)
 b2 <- matrix(c(0,0,1,1,0),5,1)
 eps <- matrix(rnorm(400),400,1)
 y <- x%*%b1 + (x%*%b2)*eps

 #finding central space based on OPG method
 #dr.csopg <- mave.compute(x,y, method = 'csopg')
 #or
 dr.csopg <- mave(y ~ x, method = 'csopg')

 #dr.meanopg <- mave.compute(x,y, method = 'meanopg')
 #or
 dr.meanopg <- mave(y ~ x, method = 'meanopg')

 #find central mean space based on ksir method
 dr.ksir <- mave(y~x,method='ksir')
 #or
 #dr.ksir <- mave.compute(x,y,method='ksir')

 #See more examples about screening and mutiple responses in the vignette
 #Using screening for high dimensional data
 #x <- matrix(rnorm(100*50),100,50)
 #y1 = as.matrix(x[,1])+rnorm(100)*.2
 #y2 = as.matrix(x[,2]+x[,3])*as.matrix(x[,1]+x[,5])+rnorm(100)*.2
 #y = cbind(y1,y2)
 #dr.sc = mave(y~x,method='CSOPG',max.dim=5,screen=20)
 #dr.sc.dim = mave.dim(dr.sc)
 #print the directions of central space with the selected variables
 #dr.sc.dim$dir[[3]][dr.sc$idx,]

The reduced data matrix

Description

The function returns the reduced data matrix of the original data. The reduced data matrix is obtained by the original data multiplied by the dimension reduction directions of given dimension.

Usage

mave.data(dr, x, dim = NULL)
mave.data(dr, x, dim = NULL)

Arguments

`dr`	the object returned by `mave` or `mave.dim`
`x`	the original data matrix of p dimensions
`dim`	the dimension of the reduced data matrix.

Examples


x <- matrix(rnorm(400),100,4)
y <- x[,1]+x[,2]+as.matrix(rnorm(100))
dr <- mave(y~x)
x.reduced <- mave.data(dr,x,3)
x <- matrix(rnorm(400),100,4)
y <- x[,1]+x[,2]+as.matrix(rnorm(100))
dr <- mave(y~x)
x.reduced <- mave.data(dr,x,3)

Select best direction using cross-validation

Description

This function selects the dimension of the central (mean) space based on the calculation of MAVE using cross-validation method.

Usage

mave.dim(dr, max.dim = 10)
mave.dim(dr, max.dim = 10)

Arguments

`dr`	the result of MAVE function
`max.dim`	the maximum dimension for cross-validation.

Value

dr.dim contains all information in dr plus cross-validation values of corresponding direction

cv0 : the cross-validation value when the null model is used
cv : the cross-validation value using dimension reduction directions of different dimensions
dim.min : the dimension of minimum cross-validation value. Note that this value can be 0.

Examples

 x <- matrix(rnorm(400*5),400,5)
 b1 <- matrix(c(1,1,0,0,0),5,1)
 b2 <- matrix(c(0,0,1,1,0),5,1)
 eps <- matrix(rnorm(400),400,1)
 y <- x%*%b1 + (x%*%b2)*eps

 #seleted dimension of central space
 dr.cs <- mave(y~x,method='csmave')
 dr.cs.dim <- mave.dim(dr.cs)

 #seleted dimension of central mean space
 dr.mean <- mave(y~x,method='meanmave')
 dr.mean.dim <- mave.dim(dr.mean)
x <- matrix(rnorm(400*5),400,5)
 b1 <- matrix(c(1,1,0,0,0),5,1)
 b2 <- matrix(c(0,0,1,1,0),5,1)
 eps <- matrix(rnorm(400),400,1)
 y <- x%*%b1 + (x%*%b2)*eps

 #seleted dimension of central space
 dr.cs <- mave(y~x,method='csmave')
 dr.cs.dim <- mave.dim(dr.cs)

 #seleted dimension of central mean space
 dr.mean <- mave(y~x,method='meanmave')
 dr.mean.dim <- mave.dim(dr.mean)

Plot of mave or mave.dim object

Description

Plot the scatterplot of given dimension directions and reponse variables.

Usage

## S3 method for class 'mave'
plot(x, dim = 4, plot.method = pairs, ...)

## S3 method for class 'mave.dim'
plot(x, dim = "dim.min", plot.method = pairs, ...)
## S3 method for class 'mave'
plot(x, dim = 4, plot.method = pairs, ...)

## S3 method for class 'mave.dim'
plot(x, dim = "dim.min", plot.method = pairs, ...)

Arguments

`x`	the object returned by mave
`dim`	the dimension
`plot.method`	the method for plotting scatter plot. The default is 'pairs'
`...`	arguments passed to the plot.method.

Examples

x = matrix(rnorm(2000),400,5)
beta1 = as.matrix(c(1,1,0,0,0))
beta2 = as.matrix(c(0,0,1,1,0))
err = as.matrix(rnorm(400))
y = (x%*%beta1)^2+x%*%beta2+err
dr = mave(y~x, method = 'meanopg')
dr.dim = mave.dim(dr)
plot(dr,dim=3)
plot(dr.dim)
x = matrix(rnorm(2000),400,5)
beta1 = as.matrix(c(1,1,0,0,0))
beta2 = as.matrix(c(0,0,1,1,0))
err = as.matrix(rnorm(400))
y = (x%*%beta1)^2+x%*%beta2+err
dr = mave(y~x, method = 'meanopg')
dr.dim = mave.dim(dr)
plot(dr,dim=3)
plot(dr.dim)

Make predictions based on the dimension reduction space

Description

This method make predictions based the reduced dimension of data using mars function.

Usage

## S3 method for class 'mave'
predict(object, newx, dim, ...)

## S3 method for class 'mave.dim'
predict(object, newx, dim = "dim.min", ...)
## S3 method for class 'mave'
predict(object, newx, dim, ...)

## S3 method for class 'mave.dim'
predict(object, newx, dim = "dim.min", ...)

Arguments

`object`	the object of class 'mave'
`newx`	Matrix of the new data to be predicted
`dim`	the dimension of central space or central mean space. The matrix of the original data will be multiplied by the matrix of dimension reduction directions of given dimension. Then the prediction will be made based on the data of given dimensions. The value of dim should be given when the class of the argument dr is mave. When the class of the argument dr is mave.dim and dim is not given, the function will return the basis matrix of CS or CMS of dimension selected by `mave.dim`
`...`	further arguments passed to `mars` function such as degree.

Value

the prediced response of the new data

Examples


X = matrix(rnorm(10000),1000,10)
beta1 = as.matrix(c(1,1,1,1,0,0,0,0,0,0))
beta2 = as.matrix(c(0,0,0,1,1,1,1,1,0,0))
err = as.matrix(rnorm(1000))
Y = X%*%beta1+X%*%beta2+err

train = sample(1:1000)[1:500]
x.train = X[train,]
y.train = as.matrix(Y[train])
x.test = X[-train,]
y.test = as.matrix(Y[-train])

dr = mave(y.train~x.train, method = 'meanopg')

yp = predict(dr,x.test,dim=3,degree=2)
#mean error
mean((yp-y.test)^2)

dr.dim = mave.dim(dr)

yp = predict(dr.dim,x.test,degree=2)
#mean error
mean((yp-y.test)^2)

X = matrix(rnorm(10000),1000,10)
beta1 = as.matrix(c(1,1,1,1,0,0,0,0,0,0))
beta2 = as.matrix(c(0,0,0,1,1,1,1,1,0,0))
err = as.matrix(rnorm(1000))
Y = X%*%beta1+X%*%beta2+err

train = sample(1:1000)[1:500]
x.train = X[train,]
y.train = as.matrix(Y[train])
x.test = X[-train,]
y.test = as.matrix(Y[-train])

dr = mave(y.train~x.train, method = 'meanopg')

yp = predict(dr,x.test,dim=3,degree=2)
#mean error
mean((yp-y.test)^2)

dr.dim = mave.dim(dr)

yp = predict(dr.dim,x.test,degree=2)
#mean error
mean((yp-y.test)^2)

4601 email record

Description

A dataset containing 4601 record of email with 57 features. These features are the relative frequency of most commonly used phrases and punctions. The data of these features are recorded 1 to 57 columns of the spam data. The outcome is spam or email which is denoted as 1 or 0, recorded in the 58th column of the data.

Format

A data frame with 4601 rows and 57 variables

Examples

data(spam)
train = sample(1:4601)[1:1000]
x.train <- as.matrix(spam[train,1:57])
y.train <- as.matrix(spam[train,58])
x.test <- as.matrix(spam[-train,1:57])
y.test <- as.matrix(spam[-train,58])
x.train <- sqrt(x.train)
x.test <- sqrt(x.test)
data(spam)
train = sample(1:4601)[1:1000]
x.train <- as.matrix(spam[train,1:57])
y.train <- as.matrix(spam[train,58])
x.test <- as.matrix(spam[-train,1:57])
y.test <- as.matrix(spam[-train,58])
x.train <- sqrt(x.train)
x.test <- sqrt(x.test)

Package 'MAVE'

Help Index

Directions of CS or CMS of given dimension

Description

Usage

Arguments

Value

See Also

Examples

Concrete Compressive Strength Data Set

Description

Format

Details

Source

References

Examples

House price in King County, USA

Description

Format

Source

Examples

Dimension reduction

Description

Usage

Arguments

Value

References

See Also

Examples

The reduced data matrix

Description

Usage

Arguments

See Also

Examples

Select best direction using cross-validation

Description

Usage

Arguments

Value

See Also

Examples

Plot of mave or mave.dim object

Description

Usage

Arguments

See Also

Examples

Make predictions based on the dimension reduction space

Description

Usage

Arguments

Value

See Also

Examples

4601 email record

Description

Format

Examples