Title: | Principal Covariates Regression |
---|---|
Description: | Analyzing regression data with many and/or highly collinear predictor variables, by simultaneously reducing the predictor variables to a limited number of components and regressing the criterion variables on these components (de Jong S. & Kiers H. A. L. (1992) <doi:10.1016/0169-7439(92)80100-I>). Several rotation and model selection options are provided. |
Authors: | Marlies Vervloet [aut, cre], Henk Kiers [aut], Eva Ceulemans [ctb] |
Maintainer: | Kristof Meers <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.7.2 |
Built: | 2024-11-19 06:53:10 UTC |
Source: | CRAN |
Analyzing regression data with many and/or highly collinear predictor variables, by simultaneously reducing the predictor variables to a limited number of components and regressing the criterion variables on these components. Several rotation options are provided in this package, as well as model selection options.
This package contains the function pcovr
, which runs a full PCovR analysis of a data set and provides several preprocessing, model selection, and rotation options. This function calls the function pcovr_est
, which estimates the PCovR parameters given a specific weigthing parameter value and a particular number of components. This function was originally written in MATLAB by De Jong & Kiers (1992). Two illustrative data sets are included: alexithymia
and psychiatrists
.
Marlies Vervloet ([email protected])
S. de Jong, H.A.L. Kiers, Principal covariates regression: Part I. Theory, Chemom. intell. lab. syst 14 (1992) 155-164.
Marlies Vervloet, Henk A. Kiers, Wim Van den Noortgate, Eva Ceulemans (2015). PCovR: An R Package for Principal Covariates Regression. Journal of Statistical Software, 65(8), 1-14. URL http://www.jstatsoft.org/v65/i08/.
data(alexithymia) results <- pcovr(alexithymia$X, alexithymia$Y) summary(results) plot(results)
data(alexithymia) results <- pcovr(alexithymia$X, alexithymia$Y) summary(results) plot(results)
The data contain the scores of 122 Belgian psychology students on the 20-item Toronto Alexithymia Scale (TAS-20; Bagby, Parker, & Taylor, 1994), which measures the inability to recognize and verbalize emotions, the Center for Epidemiological Studies Depression Scale (CES-D; Radloff, 1977), and the Rosenberg Self-Esteem Scale (RSE; Rosenberg, 1989). These data can be used to examine the extent to which the degree of depressive symptomatology (measured by the total CES-D score), and the degree of self-esteem (measured by the total RSE-score), can be predicted by the separate items of the TAS-20. We investigate the individual items because Bankier, Aigner and Bach (2001) emphasize that alexithymia is a multidimensional construct and authors disagree about the number and nature of the dimensions.
data(alexithymia)
data(alexithymia)
List of 2
confused
I am often confused about what emotion I am feeling
right words
It is difficult for me to find the right words for my feelings
sensations
I have physical sensations that even doctors don't understand
describe
I am able to describe my feelings easily
analyze problems
I prefer to analyze problems rather than just describe them
upset
When I am upset, I don't know if I am sad, frightened, or angry
puzzled
I am often puzzled by sensations in my body
let happen
I prefer to just let things happen rather than to understand why they turned out that way
let happen
I have feelings that I can't quite identify
essential
Being in touch with emotions is essential
feel about people
I find it hard to describe how I feel about people
describe more
People tell me to describe my feelings more
going on
I don't know what's going on inside me
why angry
I often don't know why I am angry
daily activities
I prefer talking to people about their daily activities rather than their feelings
entertainment
I prefer to watch "light" entertainment shows rather than psychological dramas
reveal feelings
It is difficult for me to reveal my innermost feelings, even to close friends
close
I can feel close to someone, even in moments of silence
useful
I find examination of my feelings useful in solving personal problems
hidden meanings
Looking for hidden meanings in movies or plays distracts from their enjoyment
CES-D
Degree of depressive symptomatology
RSE
Degree of self-esteem
Bagby, R. M., Parker, J. D., & Taylor, G. J. (1994). The twenty-item Toronto Alexithymia Scale: Item selection and cross-validation of the factor structure. Journal of Psychosomatic Research , 38 (1), 23-32.
Bankier, B., Aigner, M., & Bach, M. (2001). Alexithymia in DSM-IV Disorder: Comparative Evaluation of Somatoform Disorder, Panic Disorder, Obsessive-Compulsive Disorder, and Depression. Psychosomatics , 42 (3), 235-240.
Radloff, R. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied psychological measurement , 1 (3), 385-401.
Rosenberg, M. (1989). Society and the adolescent self-image. Middletown: Wesleyan University Press.
Marlies Vervloet, Henk A. Kiers, Wim Van den Noortgate, Eva Ceulemans (2015). PCovR: An R Package for Principal Covariates Regression. Journal of Statistical Software, 65(8), 1-14. URL http://www.jstatsoft.org/v65/i08/.
data(alexithymia) str(alexithymia)
data(alexithymia) str(alexithymia)
Estimating the ratio of the error variance of the predictor block, versus the error variance of the criterion block.
ErrorRatio(X, Y, Rmin = 1, Rmax = ncol(X)/3, prepX="stand",prepY="stand")
ErrorRatio(X, Y, Rmin = 1, Rmax = ncol(X)/3, prepX="stand",prepY="stand")
X |
Dataframe containing predictor scores |
Y |
Dataframe containing criterion scores |
Rmin |
Lowest number of components considered |
Rmax |
Highest number of components considered |
prepX |
Preprocessing of predictor scores: standardizing (stand) or centering data (cent) |
prepY |
Preprocessing of criterion scores: standardizing (stand) or centering data (cent) |
An estimate for the error variance of X can be obtained by applying principal component analysis to X and determining the optimal number of components through a scree test; the estimate equals the associated percentage of unexplained variance. The estimate for the error variance of Y boils down to the percentage of unexplained variance when Y is regressed on X. This approach for estimating and was based on the work of Wilderjans, Ceulemans, Van Mechelen, and Van den Berg (2011).
The returned value is the estimated error variance of X, divided by the estimated error variance of Y.
Marlies Vervloet
Wilderjans, T. F., Ceulemans, E., Van Mechelen, I., & Van den Berg, R. A. (2011). Simultaneous analysis of coupled data matrices subject to different amounts of noise. British Journal of Mathematical and Statistical Psychology , 64, 277-290.
Marlies Vervloet, Henk A. Kiers, Wim Van den Noortgate, Eva Ceulemans (2015). PCovR: An R Package for Principal Covariates Regression. Journal of Statistical Software, 65(8), 1-14. URL http://www.jstatsoft.org/v65/i08/.
data(psychiatrists) ratio <- ErrorRatio(psychiatrists$X,psychiatrists$Y)
data(psychiatrists) ratio <- ErrorRatio(psychiatrists$X,psychiatrists$Y)
Application of a PCovR analysis consists of the following steps: preprocessing the data, running PCovR analyses with different numbers of components and/or weighting parameter values, performing model selection, and rotating the retained solution for easier interpretation.
pcovr(X,Y,modsel="seq",Rmin=1,Rmax=ncol(X)/3,R=NULL,weight=NULL,rot="varimax", target=NULL, prepX="stand",prepY="stand", ratio="estimation", fold="LeaveOneOut", zeroloads=ncol(X)) ## S3 method for class 'pcovr' plot(x,cpal=NULL,lpal=NULL,...)
pcovr(X,Y,modsel="seq",Rmin=1,Rmax=ncol(X)/3,R=NULL,weight=NULL,rot="varimax", target=NULL, prepX="stand",prepY="stand", ratio="estimation", fold="LeaveOneOut", zeroloads=ncol(X)) ## S3 method for class 'pcovr' plot(x,cpal=NULL,lpal=NULL,...)
X |
Dataframe containing predictor scores |
Y |
Dataframe containing criterion scores |
modsel |
Model selection procedure (seq, seqRcv, seqAcv or sim) |
Rmin |
Lowest number of components considered |
Rmax |
Highest number of components considered |
R |
Number of components (overrules Rmin and Rmax) |
weight |
Weighting values considered |
rot |
Rotation criterion (varimax, quartimin, targetT, targetQ, |
target |
Target matrix for target rotation (components x predictor variables) |
prepX |
Preprocessing of predictor scores: standardizing (stand) or centering data (cent) |
prepY |
Preprocessing of criterion scores: standardizing (stand) or centering data (cent) |
ratio |
Ratio of the estimated error variances of the predictor block and the criterion block |
fold |
Value of k when performing k-fold cross-validation. By default, leave-one-out cross-validation is performed. |
zeroloads |
Number of near-zero loadings of the target for simplimax rotation |
x |
An object of the type produced by pcovr |
cpal |
Vector of |
lpal |
Vector of line types used for model selection plots |
... |
Further graphical arguments |
The PCovR package includes two preprocessing options, which can be applied to X and/or Y. Specifically, it is possible to only center the data (prepX="cent", prepY="cent"). However, the default option is to standardize the data (prepX="stand", prepY="stand"), which implies that X and/or Y are centered and normalized (i.e., each variable has a mean of zero and a standard deviation of one).
The fastest and therefore default model selection setting (modsel="seq") implies a sequential procedure in which the weighting value is determined on the basis of maximum likelihood principles (Vervloet, Van den Noortgate, Van Deun, & Ceulemans, 2013), but taking the weighting values entered by the user (i.e., specified with the parameter weight) into account. Specifically, if the weighting value does not equal one of those values, the entered weighting value that is closest to the maximum likelihood weighting value (in absolute sense) is used. Note that the default error variance ratio is estimated with the function ErrorRatio
, but can be specified otherwise with the parameter ratio. However, this is only possible for datasets with more observations than predictor variables. Among all models with the selected weighting value and a number of components between Rmin and Rmax, the solution is picked that has the highest st value (Cattell, 1966; Wilderjans, Ceulemans, & Meers, 2012). However, models for which the fit is less than 1% better than the fit of a less complex model are excluded. Note that the assessment of the optimal number of components can be overruled, in case one is only interested in the solutions with a particular number of components. In particular, when specifying the input parameter R, Rmin and Rmax will be ignored, and the specified number of components will be used when running the analysis and determining the weighting value.
The package also provides two sequential procedures that incorporate a cross-validation step (modsel="seqRcv" and modsel="seqAcv"). seqRcv also starts with the selection of the weighting value based on maximum likelihood principles, but in the next step, the number of components is determined using leave-one-out cross-validation. seqAcv is identical to the default procedure, but has an extra step: after the selection of the number of components, leave-one-out cross-validation is applied to choose the weighting value.
The simultaneous procedure (modsel="sim") performs leave-one-out cross-validation for all considered weighting values (weight; by default, 100 values between .01 and 1) and all numbers of components between Rmin (default: 1) and Rmax (default: number of predictors divided by 3). The weighting parameter value and number of components that maximize the cross-validation fit are retained. Note that the parameter fold can be used to alter the number of roughly equal-sized parts in which the data are split for cross-validation (Hastie, Tibshirani, & Friedman, 2001). The default value of fold is "LeaveOneOut", implying that the data is split in N (number of observations) parts.
The rotation criteria that are implemented in the PCovR package are varimax, quartimin, targetT, targetQ, wvarim
and promin
. One can also request the original solution by typing rot="none". Target rotation (Browne, 1972) orthogonally rotates the loading matrix towards a target matrix (target) that is specified by the user.
Note that Simplimax requires the specification of a number of zero elements. By default, zeroloads equals the number of predictors.
The interpretation of the obtained solution usually starts with the interpretation of the loading matrix. Specifically, the components are labeled by considering what the predictors that have the highest loadings (in absolute sense), have in common. Given these labels, the regression weights can be interpreted.
pcovr
returns a list that contains the following objects (note that some objects can be empty, depending on the model selection settings used) :
Px |
Loading matrix (components x predictor variables) |
Py |
Regression weights matrix (components x criterion variables) |
Te |
Component score matrix (observations x components) |
W |
Component weights matrix (predictor variables x components) |
Rx2 |
Proportion of explained variance in X |
Ry2 |
Proportion of explained variance in Y |
Qy2 |
Cross-validation fit as a function of weighting parameter and number of components (weighting parameter x number of components) |
VAFsum |
Weighted sum of the variance accounted for in X and in Y as a function of number of components (1 x number of components) |
alpha |
Selected value of the weighting parameter |
R |
Selected number of components |
modsel |
Model selection procedure that was used |
rot |
Rotation criterion that was used |
prepX |
Method of preprocessing that was used for the predictor scores |
prepY |
Method of preprocessing that was used for the criterion scores |
Rvalues |
Numbers of components that were considered |
Alphavalues |
Weighting parameter values that were considered |
Marlies Vervloet ([email protected])
Browne, M. W. (1972). Oblique rotation to a partially specified target. British Journal of Mathematical and Statistical Psychology , 25 (2), 207-212.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate behavioral research , 1 (2), 245-276.
De Jong, S., & Kiers, H. A. (1992). Principal covariates regression: Part I. Theory. Chemometrics and Intelligent Laboratory Systems , 155-164.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction. New York: Springer.
Vervloet, M., Van Deun, K., Van den Noortgate, W., & Ceulemans, E. (2013). On the selection of the weighting parameter value in Principal Covariates Regression. Chemometrics and Intelligent Laboratory Systems.
Marlies Vervloet, Henk A. Kiers, Wim Van den Noortgate, Eva Ceulemans (2015). PCovR: An R Package for Principal Covariates Regression. Journal of Statistical Software, 65(8), 1-14. URL http://www.jstatsoft.org/v65/i08/.
Wilderjans, T. F., Ceulemans, E., & Meers, K. (2012). CHull: A generic convex-hull-based model selection method. Behavior research methods .
data(alexithymia) results <- pcovr(alexithymia$X, alexithymia$Y) summary(results) plot(results)
data(alexithymia) results <- pcovr(alexithymia$X, alexithymia$Y) summary(results) plot(results)
Analyzing regression data with many and/or highly collinear predictor variables, by simultaneously reducing the predictor variables to a given number of components and regressing the criterion variables on these components. A weighting parameter value is specified that determines the extent to which both aspects influence the solution. Cross-validation (Hastie, Tibshirani & Friedman, 2001) options are included.
pcovr_est(X, Y, r, a, cross = FALSE, fold = "LeaveOneOut")
pcovr_est(X, Y, r, a, cross = FALSE, fold = "LeaveOneOut")
X |
Matrix containing predictor scores (observations x predictors) |
Y |
Matrix containing criterion scores (observations x criteria) |
r |
The desired number of components |
a |
The desired weighting parameter value |
cross |
Logical. If TRUE cross-validation is performed |
fold |
Value of k when performing k-fold cross-validation. By default, leave-one-out cross-validation is performed. |
W |
Component weights matrix (predictors x components) |
B |
Regression weights for predictors (predictors x criteria) |
Rx2 |
Proportion of explained variance in X |
Ry2 |
Proportion of explained variance in Y |
Te |
Component score matrix (observations x components) |
Px |
Loading matrix of components (components x predictors) |
Py |
Regression weights matrix (components x criteria) |
Qy2 |
Cross-validation fit |
Marlies Vervloet ([email protected])
De Jong, S., & Kiers, H. A. (1992). Principal covariates regression: Part I. Theory. Chemometrics and Intelligent Laboratory Systems , 155-164.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction. New York: Springer.
Marlies Vervloet, Henk A. Kiers, Wim Van den Noortgate, Eva Ceulemans (2015). PCovR: An R Package for Principal Covariates Regression. Journal of Statistical Software, 65(8), 1-14. URL http://www.jstatsoft.org/v65/i08/.
data(alexithymia) X <- data.matrix(alexithymia$X) Y <- data.matrix(alexithymia$Y) results <- pcovr_est(X, Y, r=2, a=.90) str(results)
data(alexithymia) X <- data.matrix(alexithymia$X) Y <- data.matrix(alexithymia$Y) results <- pcovr_est(X, Y, r=2, a=.90) str(results)
This is a rotation criterion, developed by Lorenzo-Seva (1999), in which oblique target rotation (tarrotob
) is applied using the Weighted Varimax solution (wvarim
) as the target matrix.
promin(F1, nrs = 20)
promin(F1, nrs = 20)
F1 |
Matrix to be rotated |
nrs |
Number of random starts |
Th |
Transformation matrix to the pattern |
loadings |
Rotated matrix |
U |
Transformation matrix to the structure |
Marlies Vervloet ([email protected])
Lorenzo-Seva, U. (1999). Promin: A method for oblique factor rotation. Multivariate Behavioral Research, 34(3), 347-365.
Marlies Vervloet, Henk A. Kiers, Wim Van den Noortgate, Eva Ceulemans (2015). PCovR: An R Package for Principal Covariates Regression. Journal of Statistical Software, 65(8), 1-14. URL http://www.jstatsoft.org/v65/i08/.
Px <- matrix(rnorm(36),12,3) print(Px) Px_r <- promin(Px) print(Px_r$loadings)
Px <- matrix(rnorm(36),12,3) print(Px) Px_r <- promin(Px) print(Px_r$loadings)
The data contain the scores of 30 Belgian psychiatric patients on 23 psychiatric symptoms and 4 psychiatric disorders (toxicomania, schizophrenia, depression, and anxiety disorder). Each score is the sumscore of the binary symptom and disorder scores that were given by 15 different psychiatrists. The data can be used to examine the extent to which the degree of toxicomania, schizophrenia, depression and anxiety disorder, can be predicted by the 23 psychiatric symptoms.
data(psychiatrists)
data(psychiatrists)
The format is: List of 2
desorganised_speech
agitation
hallucinations
inappropriate
desorientation
depression
fear
suicide
somatic_concern
narcotics
antisocial
retardation
social_isolation
routine
alcohol
negativism
denial
grandeur
suspicion
intellectual_obstruction
impulse_control
social_leveling
occupational_dysfunction
toxicomania
schizophrenia
depression
anxiety_disorder
Van Mechelen, I., & De Boeck, P. (1990). Projection of a binary criterion into a model of hierarchical classes. Psychometrika , 55 (4), 677-694.
Marlies Vervloet, Henk A. Kiers, Wim Van den Noortgate, Eva Ceulemans (2015). PCovR: An R Package for Principal Covariates Regression. Journal of Statistical Software, 65(8), 1-14. URL http://www.jstatsoft.org/v65/i08/.
data(psychiatrists) str(psychiatrists)
data(psychiatrists) str(psychiatrists)
A loading matrix indicates how predictors that have been reduced to components - e.g., in principal covariates regression (De Jong & Kiers, 1992) - relate to these components. Usually, components are interpreted by looking at what the predictors with a clear non-zero loading have in common. To make this easier, this function changes the order of the predictors presented in a loading matrix, so that the firstly, the predictors with clear non-zero loadings on the first component (with decreasing loadings) are presented, then the predictors with clear non-zero loadings on the second component, etc.
SortLoadings(Px)
SortLoadings(Px)
Px |
Dataframe that contains component loadings (components x predictors) |
SortLoadings
returns a dataframe with the same dimensions and labels as the original loading matrix, but with the columns (referring to the predictors) presented in a different order.
Marlies Vervloet ([email protected])
De Jong, S., & Kiers, H. A. (1992). Principal covariates regression: Part I. Theory. Chemometrics and Intelligent Laboratory Systems , 155-164.
Marlies Vervloet, Henk A. Kiers, Wim Van den Noortgate, Eva Ceulemans (2015). PCovR: An R Package for Principal Covariates Regression. Journal of Statistical Software, 65(8), 1-14. URL http://www.jstatsoft.org/v65/i08/.
# Compute loading matrix of alexithymia dataset data(alexithymia) results <- pcovr(alexithymia$X,alexithymia$Y) Px <- results$Px print(Px) # Sort loading matrix sorted_Px <- SortLoadings(results$Px) print(sorted_Px)
# Compute loading matrix of alexithymia dataset data(alexithymia) results <- pcovr(alexithymia$X,alexithymia$Y) Px <- results$Px print(Px) # Sort loading matrix sorted_Px <- SortLoadings(results$Px) print(sorted_Px)
Oblique target rotation
tarrotob(F1, W)
tarrotob(F1, W)
F1 |
Matrix to be rotated |
W |
Target binary matrix |
T |
Rotation matrix |
A |
Rotated matrix |
Marlies Vervloet ([email protected])
Px <- matrix(rnorm(36),12,3) print(Px) W <- matrix(rbinom(36,1,.4),12,3) Px_r <- tarrotob(Px,W) print(Px_r$A)
Px <- matrix(rnorm(36),12,3) print(Px) W <- matrix(rbinom(36,1,.4),12,3) Px_r <- tarrotob(Px,W) print(Px_r$A)
This is an oblique rotation criterion, developed by Cureton and Mulaik (1975).
wvarim(F1, nrs = 20)
wvarim(F1, nrs = 20)
F1 |
Matrix to be rotated |
nrs |
Number of random starts |
Th |
Rotation matrix |
loadings |
Rotated matrix |
W |
Matrix of weights |
fr |
Varimax function value |
ir |
Number of iterations |
Marlies Vervloet ([email protected])
Cureton, E. E., & Mulaik, S. A. (1975). The weighted varimax rotation and the promax rotation. Psychometrika, 40(2), 183-195.
Marlies Vervloet, Henk A. Kiers, Wim Van den Noortgate, Eva Ceulemans (2015). PCovR: An R Package for Principal Covariates Regression. Journal of Statistical Software, 65(8), 1-14. URL http://www.jstatsoft.org/v65/i08/.
Px <- matrix(rnorm(36),12,3) print(Px) Px_r <- wvarim(Px) print(Px_r$loadings)
Px <- matrix(rnorm(36),12,3) print(Px) Px_r <- wvarim(Px) print(Px_r$loadings)