Title: | Model-Based Response Dimension Reduction |
---|---|
Description: | Functions for model-based response dimension reduction. Usual dimension reduction methods in multivariate regression focus on the reduction of predictors, not responses. The response dimension reduction is theoretically founded in Yoo and Cook (2008) <doi:10.1016/j.csda.2008.07.029>. Later, three model-based response dimension reduction approaches are proposed in Yoo (2016) <doi:10.1080/02331888.2017.1410152> and Yoo (2019) <doi:10.1016/j.jkss.2019.02.001>. The method by Yoo and Cook (2008) is based on non-parametric ordinary least squares, but the model-based approaches are done through maximum likelihood estimation. For two model-based response dimension reduction methods called principal fitted response reduction and unstructured principal fitted response reduction, chi-squared tests are provided for determining the dimension of the response subspace. |
Authors: | Jae Keun Yoo |
Maintainer: | Jae Keun Yoo <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 1.1.1 |
Built: | 2024-12-11 07:01:58 UTC |
Source: | CRAN |
Returns a matrix used in principal fitted response reduction and unstructured principal fitted response reduction.
choose.fx(X, fx.choice=1, nclust = 5)
choose.fx(X, fx.choice=1, nclust = 5)
X |
|
fx.choice |
four choices for fx; see below |
nclust |
the number of clusters; see below |
Both of principal fitted response reduction and unstructured principal fitted response reduction require a choice of fx. The function will return one of four choices of fx, which are popular candidates among many.
fx.choice=1
: This is default and returns the original predictor matrice X, centered at zero as fx.
fx.choice=2
: This returns the original predictor matrice X, centered at zero and its squared values.
fx.choice=3
: This returns the original predictor matrice X, centered at zero and its exponentiated values.
fx.choice=4
: This clusters X with K-means algoritm with the number of clusters equal to the value in nclust
.
Then, the cluster results are expanded to dummy variables, like factor used in
lm
function.
Finally, it returns nclust-1
categorical basis. The option of nclust
works only with fx.choice=4
.
A matrix for fx.
Jae Keun Yoo, [email protected]
data(mps) X <- mps[,c(5:6,8:14)] choose.fx(X) choose.fx(X, fx.choice=2) choose.fx(X, fx.choice=4, nclust=3)
data(mps) X <- mps[,c(5:6,8:14)] choose.fx(X) choose.fx(X, fx.choice=2) choose.fx(X, fx.choice=4, nclust=3)
Returns M^power.
matpower(M, pow)
matpower(M, pow)
M |
symmetric matrix |
pow |
power |
The function computes M^power for a symmetric matrix M.
Returns
Jae Keun Yoo, [email protected]
X <- matrix(rnorm(100), c(20,5)) matpower(cov(X), -0.5) ## returns cov(X)^-0.5 %*% cov(X)^-0.5 = cov(X)^-1.
X <- matrix(rnorm(100), c(20,5)) matpower(cov(X), -0.5) ## returns cov(X)^-0.5 %*% cov(X)^-0.5 = cov(X)^-1.
This is the main function in the mbrdr package. It creates objects of class mbrdr to estimate the response mean subspace and perform tests concerning its dimension. Several helper functions that require a mbrdr object can then be applied to the output from this function.
mbrdr (formula, data, subset, na.action = na.fail, weights, ...) mbrdr.compute (y, x, weights, method = "upfrr", ...)
mbrdr (formula, data, subset, na.action = na.fail, weights, ...) mbrdr.compute (y, x, weights, method = "upfrr", ...)
formula |
a two-sided formula like |
data |
an optional data frame containing the variables in the model. By default the variables are taken from the environment from which ‘mbrdr’ is called. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
weights |
an optional vector of weights to be used where appropriate. In the context of dimension reduction methods, weights are used to obtain elliptical symmetry, not constant variance. |
na.action |
a function which indicates what should happen when the data contain ‘NA’s. The default is ‘na.fail,’ which will stop calculations. The option 'na.omit' is also permitted, but it may not work correctly when weights are used. |
x |
The design matrix. This will be computed from the formula by |
y |
The response vector or matrix |
method |
This character string specifies the method of fitting.
The default is |
... |
For |
The general regression problem mainly focuses on studying ,
the conditional mean of a response
given a set of predictors
,
where y is
-dimensional response variables with
and
This function provides methods for estimating the response dimension subspace of a general regression problem.
That is, we want to find a matrix
of minimal rank
such that
, where P(B) is an orthogonal projections onto the column space of B.
Both the dimension and the subspace
are unknown.
These methods make few assumptions.
For the methods "yc"
, "prr"
, "pfrr"
and
"upfrr"
, is estimated and returned.
And, only for
"pfrr"
and "upfrr"
,
chi-squared test results for estimating is provided.
Weights can be used, essentially to specify the relative frequency of each case in the data.
The option fx.choice
is required to fit "pfrr"
and "upfrr"
and has the following four values.
fx.choice=1
: This is default and returns the original predictor matrice X, centered at zero as fx.
fx.choice=2
: This returns the original predictor matrice X, centered at zero and its squared values.
fx.choice=3
: This returns the original predictor matrice X, centered at zero and its exponentiated values.
fx.choice=4
: This clusters X with K-means algoritm with the number of clusters equal to the value in nclust
.
Then, the cluster results are expanded to dummy variables, like factor used in
lm
function. Finally, it returns nclust-1
categorical basis. The option of nclust
works only with fx.choice=4
.
mbrdr returns an object that inherits from mbrdr (the name of the type is the
value of the method
argument), with attributes:
y |
The response matrix |
x |
The design matrix |
weights |
The weights used, normalized to add to n. |
cases |
Number of cases used. |
call |
The initial call to |
evectors |
The eigenvectors from kernel matrices to estimate |
evalues |
The eigenvalues corresponding to the eigenvectors. |
stats |
This is the dimension test statistics for |
fx |
This returns the user-selection of fx for |
numdir |
The maximum number of directions to be found. The output value of numdir may be smaller than the input value. |
method |
the dimension reduction method used. |
Jae Keun Yoo, <[email protected]>.
Yoo, JK. (2018). Response dimension reduction: model-based approach.
Statistics : A Journal of Theoretical and Applied Statistic, 52, 409-425. "prr"
and "pfrr"
Yoo, JK. (2019). Unstructured principal fitted response reduction in multivariate regression.
Journal of the Korean Statistical Society, 48, 561-567. "upfrr"
Yoo, JK. and Cook, R. D. (2008), Response dimension reduction for the conditional mean in multivariate regression.
Statistics and Probability Letters, 47, 381-389. "yc"
.
data(mps) # default fitting method is "upfrr" s0 <- mbrdr(cbind(A4, B4, A6, B6)~AFDC+Attend+B+Enrol+HS+Minority+Mobility+Poverty+PTR, data=mps) summary(s0) # Refit, using different choice of fx. summary(s1 <- update(s0, fx.choice=2)) # Refit again, using pfrr with fx.choice=2 summary(s2<-update(s1, method="pfrr", fx.choice=1)) # Refit, using prr, which does not require the choice of fx. summary(s3<- update(s1,method="prr")) # fit using Yoo-Cook method: summary(s4 <- update(s1,method="yc"))
data(mps) # default fitting method is "upfrr" s0 <- mbrdr(cbind(A4, B4, A6, B6)~AFDC+Attend+B+Enrol+HS+Minority+Mobility+Poverty+PTR, data=mps) summary(s0) # Refit, using different choice of fx. summary(s1 <- update(s0, fx.choice=2)) # Refit again, using pfrr with fx.choice=2 summary(s2<-update(s1, method="pfrr", fx.choice=1)) # Refit, using prr, which does not require the choice of fx. summary(s3<- update(s1,method="prr")) # fit using Yoo-Cook method: summary(s4 <- update(s1,method="yc"))
Accessor functions for dr objects.
mbrdr.x(object) mbrdr.y(object)
mbrdr.x(object) mbrdr.y(object)
object |
An object that inherits from |
Returns a component of a dr object. mbrdr.x
returns the predictor
matrix reduced to full rank by dropping trailing columns; mbrdr.y
returns the response vector/matrix.
Jae Keun Yoo, <[email protected]>
The Minneapolis school dataset was collected to evaluate the performance of student The percentages of students in 63 Minneapolis schools in 1972. And, The dataset was reported in Star-Tribune in 1973.
data(mps)
data(mps)
A data frame of dimension is 63 x 15. Each row represents one elementary school. The first four columns correspond to percentages of students in a school scoring above (A) and below (B) average on standardized fourth and sixth grade reading comprehension tests. Subtracting either pair of grade specific percentages from 100 gives the percentage of students scoring about average on the test. All the other variables are demographic informations for each school.
A4 = percentage of 4th graders scoring ABOVE average on a standard 4th grade vocabulary test in 1972.
B4 = percentage of 4th graders scoring BELOW average on a standard 4th grade vocabulary test in 1972.
A6 = percentage of 6th graders scoring BELOW average on a standard 6th grade comprehension test in 1972.
B6 = percentage of 6th graders scoring BELOW average on a standard 6th grade comprehension test in 1972.
AFDC = percentage of children receiving Aid to Families with Dependent Children
Attend = average percentage of childern in attendance during the year
B = percentage of children in the school not living with Both Parents
BthPts = percentage of children in the school living with Both Parents
Enrol = number of childeren enrolled in the school
HS = percent of adults in the school area who have completed high school
Minority = percent minority children in the area.
Mobility = percentage of children who started in a school, but did not finish there
Poverty = percentage of persons in the school area who are above the federal poverty levels
PTR = pupil-teacher ratio
School = names of school
Cook, R. D. and Setodji, C. M. (2003) A model-free test for reduced rank in multivariate regression. Journal of the American Statistical Association, 98, pp. 340-351.
JK. Yoo (2019) Unstructured principal fitted response reduction in multivariate regression. Journal of the Korean Statistical Society, 48, pp. 561-567.
data(mps) pairs(mps[,1:4])
data(mps) pairs(mps[,1:4])
"pfrr"
and "upfrr"
Returns Sigmahat, Sigmahat_fit and Sigmahat_res for principal fitted response reduction and unstructured principal fitted response reduction using the choice of fx.
SIGMAS(Y, fx)
SIGMAS(Y, fx)
Y |
|
fx |
the chosen fx |
Both of principal fitted response reduction and unstructured principal fitted response reduction require to compute many SIGMAs. The SIGMAs are as follows: Sigmahat = (Y^T Y)/n; Sigmahat_fit = (Y^T P_fx Y)/n; Sigmahat_res = Sigmahat - Sigmahat_fit.
A list of Sigmahat, Sigmahat_fit and Sigmahat_res.
Jae Keun Yoo, [email protected]
data(mps) X <- mps[,c(5:6,8:14)] Y <- mps[,c(1:4)] fx1 <- choose.fx(X) fx2 <- choose.fx(X, fx.choice=4, nclust=3) SIGMAS(Y, fx1) SIGMAS(Y, fx2)
data(mps) X <- mps[,c(5:6,8:14)] Y <- mps[,c(1:4)] fx1 <- choose.fx(X) fx2 <- choose.fx(X, fx.choice=4, nclust=3) SIGMAS(Y, fx1) SIGMAS(Y, fx2)