Title: | Tools to Accompany the 'psych' Package for Psychological Research |
---|---|
Description: | Support functions, data sets, and vignettes for the 'psych' package. Contains several of the biggest data sets for the 'psych' package as well as four vignettes. A few helper functions for file manipulation are included as well. For more information, see the <https://personality-project.org/r/> web page. |
Authors: | William Revelle [aut, cre] |
Maintainer: | William Revelle <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.4.3 |
Built: | 2024-11-15 06:33:01 UTC |
Source: | CRAN |
16 multiple choice ability items 1525 subjects taken from the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project are saved as iqitems
. Those data are shown as examples of how to score multiple choice tests and analyses of response alternatives. When scored correct or incorrect, the data are useful for demonstrations of tetrachoric based factor analysis irt.fa
and finding tetrachoric correlations.
data(iqitems)
data(iqitems)
A data frame with 1525 observations on the following 16 variables. The number following the name is the item number from SAPA.
reason.4
Basic reasoning questions
reason.16
Basic reasoning question
reason.17
Basic reasoning question
reason.19
Basic reasoning question
letter.7
In the following alphanumeric series, what letter comes next?
letter.33
In the following alphanumeric series, what letter comes next?
letter.34
In the following alphanumeric series, what letter comes next
letter.58
In the following alphanumeric series, what letter comes next?
matrix.45
A matrix reasoning task
matrix.46
A matrix reasoning task
matrix.47
A matrix reasoning task
matrix.55
A matrix reasoning task
rotate.3
Spatial Rotation of type 1.2
rotate.4
Spatial Rotation of type 1.2
rotate.6
Spatial Rotation of type 1.1
rotate.8
Spatial Rotation of type 2.3
16 items were sampled from 80 items given as part of the SAPA (https://www.sapa-project.org/) project (Revelle, Wilt and Rosenthal, 2009; Condon and Revelle, 2014) to develop online measures of ability. These 16 items reflect four lower order factors (verbal reasoning, letter series, matrix reasoning, and spatial rotations. These lower level factors all share a higher level factor ('g').
This data set may be used to demonstrate item response functions, tetrachoric
correlations, or irt.fa
as well as omega
estimates of of reliability and hierarchical structure.
In addition, the data set is a good example of doing item analysis to examine the empirical response probabilities of each item alternative as a function of the underlying latent trait. When doing this, it appears that two of the matrix reasoning problems do not have monotonically increasing trace lines for the probability correct. At moderately high ability (theta = 1) there is a decrease in the probability correct from theta = 0 and theta = 2.
The example data set is taken from the Synthetic Aperture Personality Assessment personality and ability test at https://www.sapa-project.org/. The data were collected with David Condon from 8/08/12 to 8/31/12.
Similar data are available from the International Cognitive Ability Resource at https://www.icar-project.org/.
Condon, David and Revelle, William, (2014) The International Cognitive Ability Resource: Development and initial validation of a public-domain measure. Intelligence, 43, 52-64.
Revelle, William, Dworak, Elizabeth M. and Condon, David (2020) Cognitive ability in everyday life: the utility of open-source measures. Current Directions in Psychological Science, 29, (4) 358-363. Open access at doi:10.1177/0963721420922178.
Dworak, Elizabeth M., Revelle, William, Doebler, Philip and Condon, David (2021) Using the International Cognitive Ability Resource as an open source tool to explore individual differences in cognitive ability. Personality and Individual Differences, 169. Open access at doi:10.1016/j.paid.2020.109906. Revelle, William, Wilt, Joshua, and Rosenthal, Allen (2010) Personality and Cognition: The Personality-Cognition Link. In Gruszka, Alexandra and Matthews, Gerald and Szymura, Blazej (Eds.) Handbook of Individual Differences in Cognition: Attention, Memory and Executive Control, Springer.
data(ability) cs<- psych::cs keys <- list(ICAR16=colnames(ability),reasoning = cs(reason.4,reason.16,reason.17,reason.19), letters=cs(letter.7, letter.33,letter.34,letter.58), matrix=cs(matrix.45,matrix.46,matrix.47,matrix.55), rotate=cs(rotate.3,rotate.4,rotate.6,rotate.8)) psych::scoreOverlap(keys,ability) #this next step takes a few seconds to run and demonstrates IRT approaches ability.irt <- psych::irt.fa(ability) ability.scores <- psych::scoreIrt(ability.irt,ability) ability.sub.scores <- psych::scoreIrt.2pl(keys,ability) #demonstrate irt scoring #It is sometimes asked how to handle missing data when finding scores #this next example compares 3 ways of scoring ability items from icar #Just sum the items #Sum the means for the items #IRT score the items total <- rowSums(ability, na.rm=TRUE) means <- rowMeans(ability, na.rm=TRUE) irt <- psych::scoreIrt(items=ability)[1] df <- data.frame(total, means,irt) psych:: pairs.panels(df)
data(ability) cs<- psych::cs keys <- list(ICAR16=colnames(ability),reasoning = cs(reason.4,reason.16,reason.17,reason.19), letters=cs(letter.7, letter.33,letter.34,letter.58), matrix=cs(matrix.45,matrix.46,matrix.47,matrix.55), rotate=cs(rotate.3,rotate.4,rotate.6,rotate.8)) psych::scoreOverlap(keys,ability) #this next step takes a few seconds to run and demonstrates IRT approaches ability.irt <- psych::irt.fa(ability) ability.scores <- psych::scoreIrt(ability.irt,ability) ability.sub.scores <- psych::scoreIrt.2pl(keys,ability) #demonstrate irt scoring #It is sometimes asked how to handle missing data when finding scores #this next example compares 3 ways of scoring ability items from icar #Just sum the items #Sum the means for the items #IRT score the items total <- rowSums(ability, na.rm=TRUE) means <- rowMeans(ability, na.rm=TRUE) irt <- psych::scoreIrt(items=ability)[1] df <- data.frame(total, means,irt) psych:: pairs.panels(df)
A recurring question in the study of affect is the proper dimensionality and the relationship to various personality dimensions. Here is a data set taken from two studies of mood and arousal using movies to induce affective states.
data(affect)
data(affect)
These are data from two studies conducted in the Personality, Motivation and Cognition Laboratory at Northwestern University. Both studies used a similar methodology:
Collection of pretest data using 5 scales from the Eysenck Personality Inventory and items taken from the Motivational State Questionnaire (see msq
. In addition, state and trait anxiety measures were given. In the “maps" study, the Beck Depression Inventory was given also.
Then subjects were randomly assigned to one of four movie conditions: 1: Frontline. A documentary about the liberation of the Bergen-Belsen concentration camp. 2: Halloween. A horror film. 3: National Geographic, a nature film about the Serengeti plain. 4: Parenthood. A comedy. Each film clip was shown for 9 minutes. Following this the MSQ was given again.
Data from the MSQ were scored for Energetic and Tense Arousal (EA and TA) as well as Positive and Negative Affect (PA and NA).
Study flat had 170 participants, study maps had 160.
These studies are described in more detail in various publications from the PMC lab. In particular, Revelle and Anderson, 1997 and Rafaeli and Revelle (2006). An analysis of these data has also appeared in Smillie et al. (2012).
For a much more complete data set involving film, caffeine, and time of day manipulations, see the msqR
data set.
Data collected at the Personality, Motivation, and Cognition Laboratory, Northwestern University.
Revelle, William and Anderson, Kristen Joan (1997) Personality, motivation and cognitive performance: Final report to the Army Research Institute on contract MDA 903-93-K-0008
Rafaeli, Eshkol and Revelle, William (2006), A premature consensus: Are happiness and sadness truly opposite affects? Motivation and Emotion, 30, 1, 1-12.
Smillie, Luke D. and Cooper, Andrew and Wilt, Joshua and Revelle, William (2012) Do Extraverts Get More Bang for the Buck? Refining the Affective-Reactivity Hypothesis of Extraversion. Journal of Personality and Social Psychology, 103 (2), 206-326.
data(affect) psych::describeBy(affect[-1],group="Film") psych::pairs.panels(affect[14:17],bg=c("red","black","white","blue")[affect$Film],pch=21, main="Affect varies by movies ") psych::errorCircles("EA2","TA2",data=affect,group="Film",labels=c("Sad","Fear","Neutral","Humor") , main="Enegetic and Tense Arousal by Movie condition") psych::errorCircles(x="PA2",y="NA2",data=affect,group="Film",labels=c("Sad","Fear","Neutral"," Humor"), main="Positive and Negative Affect by Movie condition")
data(affect) psych::describeBy(affect[-1],group="Film") psych::pairs.panels(affect[14:17],bg=c("red","black","white","blue")[affect$Film],pch=21, main="Affect varies by movies ") psych::errorCircles("EA2","TA2",data=affect,group="Film",labels=c("Sad","Fear","Neutral","Humor") , main="Enegetic and Tense Arousal by Movie condition") psych::errorCircles(x="PA2",y="NA2",data=affect,group="Film",labels=c("Sad","Fear","Neutral"," Humor"), main="Positive and Negative Affect by Movie condition")
Athenstaedt (2003) examined Gender Role Self-Concept. She reports two independent dimensions of Male and Female behaviors. While there are large gender/sex differences on both of these dimensions, the two represent independent factorsl Eagly and Revelle (2022) have used these data to explore the power of aggregation when examining sex differences. This data set is also useful to show various graphical display procedures.
data("Athenstaedt")
data("Athenstaedt")
A data frame with 576 observations on the following 117 variables.
STUDIE
a numeric vector
gender
Male =1, Female= 2
self report items (see Athenstaedt.dictionary)
Gender (Male = 1, Female =2)
To pay attention to ones appearance in the office
Offer fire to somebody
Paint an Apartment
Mow the Lawn
Make the Bed
Hold the Door Open for your Partner
Do the Dishes
Do Extreme Sports
Tinker with the Car
Talk about Sports
Assemble Prefabricated Furniture
Drive a Car in a Risky Way
Listen Attentively to Others
Tell your Partner about Problems at Work
Play on a Computer
Set the Table
Watch ones Weight
Care for a Partner if he/she is Ill
Play Chess
Meet with friends at a Regulars Table
Watch Soap Operas
Take a Friends Arm
Wrap Presents Beautifully
In case of Vacation with Partner Packing the Luggage for Both
To admit own Occupational Weekness
Work Overtime
Openly Show Vulnerability
Babysit
Change Fuses
Clean a Drain
Take Care of Somebody
Do Repair Work
Change Light Bulbs
Wash the Car
Ride a Motorcycle
Cook Meat on the Grill
Thump Carpets
Dust the Furniture
Buy Electric Appliances
Go Dancing
Go for a Walk through Town
Go to the Ballet
Hug a Friend
Do Handiwork (e.g. Knitting)
Change Bed Sheets
Sew on a Button
Do Aerobics
Watch Sports on Television
Talk about Problems
Play Parlor Games
Talk about Politics
Take Care of Flowers
Make Coffee in the Office
Shovel Snow
Read non-Fiction Books
Organize Company Parties
Do Home Improvement Jobs
Plead for the Socially Disadvantaged
Buy a Present for a Colleague
To Talk with Colleagues about Family Matters
Make Jam
Frquently Ask Colleagues Questions
Decorate the Office with Flowers
Pick up the Dinner Bill
Shop for the Family
Have Problem using Technical Devices
Care for Family Besides a Job
Watch Action Movies
Cook
Help your Partner Put on His or Her Coat
Wash Windows
Do the Ironing
Do the Laundry
Put on Make-up
Femininity Scale
Masculinity Scale
Femininity Scale
Masculinity Scale
Pooled Scale
see the original Athenstaedt paper
FBEHAV
a numeric vector
MBEHAV
a numeric vector
Femininity
a numeric vector
Masculinity
a numeric vector
MF
a numeric vector
Ursala Athenstaedt (2003) reported several analyses of items and scales measuring Gender Role Self-Concept. Eagly and Revelle (2022) have used these data in an analysis of the power of aggregation. Here are the original items as well as the three scales Eagly and Revelle (2022). The accompanying Athenstaedt.dictionary may be used to see the items.
See the GERAS
data set for a related example.
Ursala Athenstaedt, personal communication, 2022, provided a SPSS sav file with the original data from which the complete cases in this set were selected.
Ursula Athenstaedt (2003) On the Content and Structure of the Gender Role Self-Concept: Including Gender-Stereotypical Behaviors in Addition to Traits. Psychology of Women Quarterly, 27, 309-318. doi: 10.1111/1471-6402.00111.
Alice Eagly and William Revelle (2022) Understanding the Magnitude of Psychological Differences Between Women and Men Requires Seeing the Forest and the Trees. Perspectives in Psychological Science doi:10.1177/17456916211046006.
data(Athenstaedt) psych::scatterHist(Femininity ~ Masculinity + gender, data =Athenstaedt, cex.point=.4,smooth=FALSE, correl=FALSE,d.arrow=TRUE,col=c("red","blue"), lwd=4, cex.main=1.5,main="Scatter Plot and Density",cex.axis=2) psych::cohen.d(Athenstaedt[2:76], group="gender", dictionary=Athenstaedt.dictionary) #show the top 5 items for each scale select <- c(psych::selectFromKeys(Athenstaedt.keys$MF10),"gender") psych::corPlot(Athenstaedt[,select], main="F and M items from Athenstaedt")
data(Athenstaedt) psych::scatterHist(Femininity ~ Masculinity + gender, data =Athenstaedt, cex.point=.4,smooth=FALSE, correl=FALSE,d.arrow=TRUE,col=c("red","blue"), lwd=4, cex.main=1.5,main="Scatter Plot and Density",cex.axis=2) psych::cohen.d(Athenstaedt[2:76], group="gender", dictionary=Athenstaedt.dictionary) #show the top 5 items for each scale select <- c(psych::selectFromKeys(Athenstaedt.keys$MF10),"gender") psych::corPlot(Athenstaedt[,select], main="F and M items from Athenstaedt")
25 personality self report items taken from the International Personality Item Pool (ipip.ori.org) were included as part of the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project. The data from 2800 subjects are included here as a demonstration set for scale construction, factor analysis, and Item Response Theory analysis. Three additional demographic variables (sex, education, and age) are also included.
data(bfi) data(bfi.dictionary)
data(bfi) data(bfi.dictionary)
A data frame with 2800 observations on the following 28 variables. (The q numbers are the SAPA item numbers).
A1
Am indifferent to the feelings of others. (q_146)
A2
Inquire about others' well-being. (q_1162)
A3
Know how to comfort others. (q_1206)
A4
Love children. (q_1364)
A5
Make people feel at ease. (q_1419)
C1
Am exacting in my work. (q_124)
C2
Continue until everything is perfect. (q_530)
C3
Do things according to a plan. (q_619)
C4
Do things in a half-way manner. (q_626)
C5
Waste my time. (q_1949)
E1
Don't talk a lot. (q_712)
E2
Find it difficult to approach others. (q_901)
E3
Know how to captivate people. (q_1205)
E4
Make friends easily. (q_1410)
E5
Take charge. (q_1768)
N1
Get angry easily. (q_952)
N2
Get irritated easily. (q_974)
N3
Have frequent mood swings. (q_1099
N4
Often feel blue. (q_1479)
N5
Panic easily. (q_1505)
O1
Am full of ideas. (q_128)
O2
Avoid difficult reading material.(q_316)
O3
Carry the conversation to a higher level. (q_492)
O4
Spend time reflecting on things. (q_1738)
O5
Will not probe deeply into a subject. (q_1964)
gender
Males = 1, Females =2
education
1 = HS, 2 = finished HS, 3 = some college, 4 = college graduate 5 = graduate degree
age
age in years
The first 25 items are organized by five putative factors: Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Opennness. The scoring key is created using make.keys
, the scores are found using score.items
.
These five factors are a useful example of using irt.fa
to do Item Response Theory based latent factor analysis of the polychoric
correlation matrix. The endorsement plots for each item, as well as the item information functions reveal that the items differ in their quality.
The item data were collected using a 6 point response scale: 1 Very Inaccurate 2 Moderately Inaccurate 3 Slightly Inaccurate 4 Slightly Accurate 5 Moderately Accurate 6 Very Accurate
as part of the Synthetic Apeture Personality Assessment (SAPA https://www.sapa-project.org/) project. To see an example of the data collection technique, visit https://www.SAPA-project.org/ or the International Cognitive Ability Resource at https://icar-project.org. The items given were sampled from the International Personality Item Pool of Lewis Goldberg using the sampling technique of SAPA. This is a sample data set taken from the much larger SAPA data bank.
The bfi data set and items should not be confused with the BFI (Big Five Inventory) of Oliver John and colleagues (John, O. P., Donahue, E. M., & Kentle, R. L. (1991). The Big Five Inventory–Versions 4a and 54. Berkeley, CA: University of California,Berkeley, Institute of Personality and Social Research.)
The items are from the ipip (Goldberg, 1999). The data are from the SAPA project (Revelle, Wilt and Rosenthal, 2010) , collected Spring, 2010 ( https://www.sapa-project.org/).
Goldberg, L.R. (1999) A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. In Mervielde, I. and Deary, I. and De Fruyt, F. and Ostendorf, F. (eds) Personality psychology in Europe. 7. Tilburg University Press. Tilburg, The Netherlands.
Revelle, W., Wilt, J., and Rosenthal, A. (2010) Individual Differences in Cognition: New Methods for examining the Personality-Cognition Link In Gruszka, A. and Matthews, G. and Szymura, B. (Eds.) Handbook of Individual Differences in Cognition: Attention, Memory and Executive Control, Springer.
Revelle, W, Condon, D.M., Wilt, J., French, J.A., Brown, A., and Elleman, L.G. (2016) Web and phone based data collection using planned missing designs. In Fielding, N.G., Lee, R.M. and Blank, G. (Eds). SAGE Handbook of Online Research Methods (2nd Ed), Sage Publcations.
bi.bars
to show the data by age and gender, irt.fa
for item factor analysis applying the irt model.
data(bfi) psych::describe(bfi) # create the bfi.keys (actually already saved in the data file) bfi.keys <- list(agree=c("-A1","A2","A3","A4","A5"),conscientious=c("C1","C2","C3","-C4","-C5"), extraversion=c("-E1","-E2","E3","E4","E5"),neuroticism=c("N1","N2","N3","N4","N5"), openness = c("O1","-O2","O3","O4","-O5")) scores <- psych::scoreItems(bfi.keys,bfi,min=1,max=6) #specify the minimum and maximum values scores #show the use of the keys.lookup with a dictionary psych::keys.lookup(bfi.keys,bfi.dictionary[,1:4])
data(bfi) psych::describe(bfi) # create the bfi.keys (actually already saved in the data file) bfi.keys <- list(agree=c("-A1","A2","A3","A4","A5"),conscientious=c("C1","C2","C3","-C4","-C5"), extraversion=c("-E1","-E2","E3","E4","E5"),neuroticism=c("N1","N2","N3","N4","N5"), openness = c("O1","-O2","O3","O4","-O5")) scores <- psych::scoreItems(bfi.keys,bfi,min=1,max=6) #specify the minimum and maximum values scores #show the use of the keys.lookup with a dictionary psych::keys.lookup(bfi.keys,bfi.dictionary[,1:4])
Lew Goldberg organized 100 adjectives to measure 5 factors of personality (The Big5). 500 hundred participants were given these adjectives along with other personality measures. This dictionary allows for easy item labeling of the results. ~
data("BFI.adjectives.dictionary")
data("BFI.adjectives.dictionary")
A data frame with 100 observations on the following 2 variables.
numer
a character vector of the item label
Item
a character vector of the actual adjectives
Keying information for the 100 adjectives:
Data collected at the Personality, Motivation, and Cognition Laboratory, Northwestern University.
Lewis R. Goldberg,(1992) The development of markers for the Big-Five factor structure, Psychological Assessment, 4 (1) 26-42.
big5.100.adjectives
for examples of the data.
msqR
for 3896 participants with scores on five scales of the EPI. affect
for an example of the use of some of these adjectives in a mood manipulation study.
data(BFI.adjectives.dictionary) #this includes the bfi.adjectives.keys bfi.adjectives.keys <- list( Agreeableness = psych::cs(V2, -V11, V14, V15, -V19, -V21, V29, -V31, V32, V48, V55,-V61, -V63, V69, V76, -V78, -V79, -V90, -V94, V99), Conscientiousness = psych::cs(V9, -V10, V13, -V20, V22, -V30, -V37, -V38, -V39, V50, -V51, V53, V56, V57, -V67, V68, V70, V73, -V82, -V95), Extraversion = psych::cs(V1,V5, -V6,V7, V17, V24, V26, -V40,-V45, -V58, -V60,-V65, V71, -V74, -V77, V92, -V96, V97, V98, -V100), Neuroticism= psych::cs(V3, V23, V25, V27,V28, V33,-V36, V42, V46,V47, V49, V52,-V59,V62, V72, V75, -V81,-V83,-V84, -V85), Openness = psych::cs(V4,V8,V12, V16, V18,V34, -V35,V41, V43, V44, V54, -V64,-V66, -V80, -V86, -V87, -V88, -V89, -V91, -V93) ) psych::lookupFromKeys(bfi.adjectives.keys,bfi.adjectives.dictionary,20)
data(BFI.adjectives.dictionary) #this includes the bfi.adjectives.keys bfi.adjectives.keys <- list( Agreeableness = psych::cs(V2, -V11, V14, V15, -V19, -V21, V29, -V31, V32, V48, V55,-V61, -V63, V69, V76, -V78, -V79, -V90, -V94, V99), Conscientiousness = psych::cs(V9, -V10, V13, -V20, V22, -V30, -V37, -V38, -V39, V50, -V51, V53, V56, V57, -V67, V68, V70, V73, -V82, -V95), Extraversion = psych::cs(V1,V5, -V6,V7, V17, V24, V26, -V40,-V45, -V58, -V60,-V65, V71, -V74, -V77, V92, -V96, V97, V98, -V100), Neuroticism= psych::cs(V3, V23, V25, V27,V28, V33,-V36, V42, V46,V47, V49, V52,-V59,V62, V72, V75, -V81,-V83,-V84, -V85), Openness = psych::cs(V4,V8,V12, V16, V18,V34, -V35,V41, V43, V44, V54, -V64,-V66, -V80, -V86, -V87, -V88, -V89, -V91, -V93) ) psych::lookupFromKeys(bfi.adjectives.keys,bfi.adjectives.dictionary,20)
Lew Goldberg organized 100 adjectives to measure 5 factors of personality (The Big5). 500 hundred participants were given these adjectives along with other personality measures in the Personality, Motivation and Cognition (PMC) lab. This data set is for demonstrations of factor and cluster analysis.
data("big5.100.adjectives")
data("big5.100.adjectives")
A data frame with 554 observations on the following 102 variables.
study
a character vector
id
a numeric vector
V1
numeric vector (see big5.adjectives.dictionary)
V100
A numeric vector. (see big5.adjectives.dictionary)
a key list
Procedure. The data were collected over nine years in the Personality, Motivation and Cognition laboratory at Northwestern, as part of a series of studies examining the effects of personality and situational factors on motivational state and subsequent cognitive performance. In each of 38 studies, prior to any manipulation of motivational state, participants signed a consent form and in some studies, consumed 0 or 4mg/kg of caffeine. In caffeine studies, they waited 30 minutes and then filled out the MSQ as well as other personality trait measures (e.g. the Big 5 adjectives)
Data collected at the Personality, Motivation, and Cognition Laboratory, Northwestern University.
Lewis R. Goldberg,(1992) The development of markers for the Big-Five factor structure, Psychological Assessment, 4 (1) 26-42.
Revelle, W. and Anderson, K.J. (1998) Personality, motivation and cognitive performance: Final report to the Army Research Institute on contract MDA 903-93-K-0008. (https://www.personality-project.org/revelle/publications/ra.ari.98.pdf).
data(big5.100.adjectives) five.scores <- psych::scoreItems(big5.adjectives.keys,big5.100.adjectives) summary(five.scores)
data(big5.100.adjectives) five.scores <- psych::scoreItems(big5.adjectives.keys,big5.100.adjectives) summary(five.scores)
Normally, min.res factor analysis and maximum likelihood produce very similar results. This data set (from Alexandra Blant) does not. Warnings are given for the min.res solution, the pa solution, but not the old.min nor the mle solution. Included as a test case for the factor analysis function.
data("blant")
data("blant")
The format is: num [1:29, 1:29] 1 0.77 0.813 0.68 0.717 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:29] "V1" "V2" "V3" "V4" ...
This data matrix was sent by Alexandra Blant as an example of a problem with the minres solution in the fa
function. The default solution, using fm="minres" issues a warning that the solution has improper factor score weights. This is not the case for the fm="old.min" and fm="mle" options, but is for fm="pa", fm="ols".
The residuals are indeed smaller for fm="minres" than for fm="old.min" or fm="mle".
"old.min" attempts to find the minimum residual but uses the gradient for mle. This was the approach until version 1.7.5 but was changed (see the help page for fa) following extensive communication with Hao Wu.
The problem with this matrix is probably that it is almost singular, with some smcs approaching 1 and the smallest three eigenvalues of .006, .004 and .001.
This problem matrix was provided by Alexandra Blant.
Alexandra Blant, personal communication
data(blant) #compare f5 <- psych::fa(blant,5,rotate="none") #the default minres f5.old <- psych::fa(blant,5, fm="old.min",rotate="none") #old version of minres f5.mle <- psych::fa(blant,5,fm="mle",rotate= "none") #maximum likelihood #compare solutions psych::factor.congruence(list(f5,f5.old,f5.mle)) #compare sums of squared residuals sum(residuals(f5,diag=FALSE)^2,na.rm=TRUE) # 1.355489 sum(residuals(f5.old,diag=FALSE)^2,na.rm=TRUE) # 1.539757 sum(residuals(f5.mle,diag=FALSE)^2,na.rm=TRUE) # 2.402092 #but, when we divide the squared residuals by the original (squared) correlations, we find #a different ordering of fit f5$fit # 0.9748177 f5.old$fit # 0.9752774 f5.mle$fit # 0.9603324
data(blant) #compare f5 <- psych::fa(blant,5,rotate="none") #the default minres f5.old <- psych::fa(blant,5, fm="old.min",rotate="none") #old version of minres f5.mle <- psych::fa(blant,5,fm="mle",rotate= "none") #maximum likelihood #compare solutions psych::factor.congruence(list(f5,f5.old,f5.mle)) #compare sums of squared residuals sum(residuals(f5,diag=FALSE)^2,na.rm=TRUE) # 1.355489 sum(residuals(f5.old,diag=FALSE)^2,na.rm=TRUE) # 1.539757 sum(residuals(f5.mle,diag=FALSE)^2,na.rm=TRUE) # 2.402092 #but, when we divide the squared residuals by the original (squared) correlations, we find #a different ordering of fit f5$fit # 0.9748177 f5.old$fit # 0.9752774 f5.mle$fit # 0.9603324
35 items for 150 subjects from Bond's Logical Operations Test. A good example of Item Response Theory analysis using the Rasch model. One parameter (Rasch) analysis and two parameter IRT analyses produce somewhat different results.
data(blot)
data(blot)
A data frame with 150 observations on 35 variables. The BLOT was developed as a paper and pencil test for children to measure Logical Thinking as discussed by Piaget and Inhelder.
Bond and Fox apply Rasch modeling to a variety of data sets. This one, Bond's Logical Operations Test, is used as an example of Rasch modeling for dichotomous items. In their text (p 56), Bond and Fox report the results using WINSTEPS. Those results are consistent (up to a scaling parameter) with those found by the rasch function in the ltm package. The WINSTEPS seem to produce difficulty estimates with a mean item difficulty of 0, whereas rasch from ltm has a mean difficulty of -1.52. In addition, rasch seems to reverse the signs of the difficulty estimates when reporting the coefficients and is effectively reporting "easiness".
However, when using a two parameter model, one of the items (V12) behaves very differently.
This data set is useful when comparing 1PL, 2PL and 2PN IRT models.
The data are taken (with kind permission from Trevor Bond) from the webpage https://www.winsteps.com/BF3/bondfox3.htm and read using read.fwf.
T.G. Bond. BLOT:Bond's Logical Operations Test. Townsville, Australia: James Cook Univer- sity. (Original work published 1976), 1995.
T. Bond and C. Fox. (2007) Applying the Rasch model: Fundamental measurement in the human sciences. Lawrence Erlbaum, Mahwah, NJ, US, 2 edition.
See also the irt.fa
and associated plot functions.
data(blot) #ltm is not required by psychTools, but if available, may be run to show a Rasch model #do the same thing with functions in psych blot.fa <- psych::irt.fa(blot) # a 2PN model plot(blot.fa)
data(blot) #ltm is not required by psychTools, but if available, may be run to show a Rasch model #do the same thing with functions in psych blot.fa <- psych::irt.fa(blot) # a 2PN model plot(blot.fa)
Cyril Burt reported an early factor analysis with a circumplex structure of 11 emotional variables in 1915. 8 of these were subsequently used by Harman in his text on factor analysis. Unfortunately, it seems as if Burt made a mistake for the matrix is not positive definite. With one change from .87 to .81 the matrix is positive definite.
data(burt)
data(burt)
A correlation matrix based upon 172 "normal school age children aged 9-12".
Sociality
Sorrow
Tenderness
Joy
Wonder
Elation
Disgust
Anger
Sex
Fear
Subjection
The Burt data set is interesting for several reasons. It seems to be an early example of the organizaton of emotions into an affective circumplex, a subset of it has been used for factor analysis examples (see Harman.Burt
, and it is an example of how typos affect data. The original data matrix has one negative eigenvalue. With the replacement of the correlation between Sorrow and Tenderness from .87 to .81, the matrix is positive definite.
Alternatively, using cor.smooth
, the matrix can be made positive definite as well, although cor.smooth makes more (but smaller) changes.
(retrieved from the web at https://www.biodiversitylibrary.org/item/95822#790) Following a suggestion by Jan DeLeeuw.
Burt, C.General and Specific Factors underlying the Primary Emotions. Reports of the British Association for the Advancement of Science, 85th meeting, held in Manchester, September 7-11, 1915. London, John Murray, 1916, p. 694-696 (retrieved from the web at https://www.biodiversitylibrary.org/item/95822#790)
Harman.Burt
in the Harman
dataset and cor.smooth
data(burt) eigen(burt)$values #one is negative! burt.new <- burt burt.new[2,3] <- burt.new[3,2] <- .81 eigen(burt.new)$values #all are positive bs <- psych::cor.smooth(burt) round(burt.new - bs,3)
data(burt) eigen(burt)$values #one is negative! burt.new <- burt burt.new[2,3] <- burt.new[3,2] <- .81 eigen(burt.new)$values #all are positive bs <- psych::cor.smooth(burt) round(burt.new - bs,3)
Airline distances between 11 US cities may be used as an example for multidimensional scaling or cluster analysis.
data(cities)
data(cities)
A data frame with 11 observations on the following 11 variables.
ATL
Atlana, Georgia
BOS
Boston, Massachusetts
ORD
Chicago, Illinois
DCA
Washington, District of Columbia
DEN
Denver, Colorado
LAX
Los Angeles, California
MIA
Miami, Florida
JFK
New York, New York
SEA
Seattle, Washington
SFO
San Francisco, California
MSY
New Orleans, Lousianna
An 11 x11 matrix of distances between major US airports. This is a useful demonstration of multiple dimensional scaling.
city.location is a dataframe of longitude and latitude for those cities.
Note that the 2 dimensional MDS solution does not perfectly capture the data from these city distances. Boston, New York and Washington, D.C. are located slightly too far west, and Seattle and LA are slightly too far south.
https://www.timeanddate.com/worldclock/distance.html
data(cities) city.location[,1] <- -city.location[,1] #included in the cities data set plot(city.location, xlab="Dimension 1", ylab="Dimension 2", main ="Multidimensional scaling of US cities") #do the mds city.loc <- cmdscale(cities, k=2) #ask for a 2 dimensional solution round(city.loc,0) city.loc <- -city.loc #flip the axes city.loc <- psych::rescale(city.loc,apply(city.location,2,mean),apply(city.location,2,sd)) points(city.loc,type="n") #add the date point to the map text(city.loc,labels=names(cities)) ## Not run: #we need the maps package to be available #an overlay map can be added if the package maps is available if(require(maps)) { map("usa",add=TRUE) } ## End(Not run)
data(cities) city.location[,1] <- -city.location[,1] #included in the cities data set plot(city.location, xlab="Dimension 1", ylab="Dimension 2", main ="Multidimensional scaling of US cities") #do the mds city.loc <- cmdscale(cities, k=2) #ask for a 2 dimensional solution round(city.loc,0) city.loc <- -city.loc #flip the axes city.loc <- psych::rescale(city.loc,apply(city.location,2,mean),apply(city.location,2,sd)) points(city.loc,type="n") #add the date point to the map text(city.loc,labels=names(cities)) ## Not run: #we need the maps package to be available #an overlay map can be added if the package maps is available if(require(maps)) { map("usa",add=TRUE) } ## End(Not run)
Colom et al. analyze 14 tests from the Spanish version of the WAIS. This is a nice example of a hierarchical structure using the omega function. Here are the correlation matrices of the variables (colom), for 4 levels of education.
data("colom") data("colom.ed0") data("colom.ed1") data("colom.ed2") data("colom.ed3")
data("colom") data("colom.ed0") data("colom.ed1") data("colom.ed2") data("colom.ed3")
The format is: num [1:14, 1:14] 1 0.755 0.608 0.555 0.715 0.729 0.627 0.616 0.606 0.598 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:14] "Vocabulary" "Similarities" "Arithmetic" "Digit_span" ... ..$ : chr [1:14] "Vocabulary" "Similarities" "Arithmetic" "Digit_span" ...
The Wechsler Adult Intelligence Scale (WAIS) is the "gold standard" measure of intelligence. Here is an example of the correlational structure of 14 tests. It was used by Colom and his colleagues to find correlations of WAIS scores as a function of education. Here we show the complete standardization sample.
The colom data set is the complete correlation matrix for all subjects (703 females, 666 males). The four subset data sets for four levels of education. Ns = 301, 432, 525, and 111.
Colom et al, 2002
Roberto Colom and Francisco J Abad and Luis F Garc and Manuel Juan-Espinosa, 2002, Education, Wechsler's Full Scale IQ, and g. Intelligence, 30, 449-462,
data(colom) psych::lowerMat(colom) psych::omega(colom, 4) #do the omega analysis
data(colom) psych::lowerMat(colom) psych::omega(colom, 4) #do the omega analysis
Francis Galton introduced the 'co-relation' in 1888 with a paper discussing how to measure the relationship between two variables. His primary example was the relationship between height and forearm length. The data table (cubits) is taken from Galton (1888). Unfortunately, there seem to be some errors in the original data table in that the marginal totals do not match the table.
The data frame, heights
, is converted from this table.
data(cubits)
data(cubits)
A data frame with 9 observations on the following 8 variables.
16.5
Cubit length < 16.5
16.75
16.5 <= Cubit length < 17.0
17.25
17.0 <= Cubit length < 17.5
17.75
17.5 <= Cubit length < 18.0
18.25
18.0 <= Cubit length < 18.5
18.75
18.5 <= Cubit length < 19.0
19.25
19.0 <= Cubit length < 19.5
19.75
19.5 <= Cubit length
Sir Francis Galton (1888) published the first demonstration of the correlation coefficient. The regression (or reversion to mediocrity) of the height to the length of the left forearm (a cubit) was found to .8. There seem to be some errors in the table as published in that the row sums do not agree with the actual row sums. These data are used to create a matrix using table2matrix
for demonstrations of analysis and displays of the data.
Galton (1888)
Galton, Francis (1888) Co-relations and their measurement. Proceedings of the Royal Society. London Series,45,135-145,
table2matrix
, table2df
, ellipses
, heights
, peas
,galton
data(cubits) cubits heights <- psych::table2df(cubits,labs = c("height","cubit")) psych::ellipses(heights,n=1,main="Galton's co-relation data set") psych::ellipses(jitter(heights$height,3),jitter(heights$cubit,3),pch=".", main="Galton's co-relation data set",xlab="height", ylab="Forearm (cubit)") #add in some noise to see the points psych::pairs.panels(heights,jiggle=TRUE,main="Galton's cubits data set")
data(cubits) cubits heights <- psych::table2df(cubits,labs = c("height","cubit")) psych::ellipses(heights,n=1,main="Galton's co-relation data set") psych::ellipses(jitter(heights$height,3),jitter(heights$cubit,3),pch=".", main="Galton's co-relation data set",xlab="height", ylab="Forearm (cubit)") #add in some noise to see the points psych::pairs.panels(heights,jiggle=TRUE,main="Galton's cubits data set")
The classic data set used by Gossett (publishing as Student) for the introduction of the t-test. The design was a within subjects study with hours of sleep in a control condition compared to those in 3 drug conditions. Drug1 was 06mg of L Hscyamine, Drug 2L and Drug2R were said to be .6 mg of Left and Right isomers of Hyoscine. As discussed by Zabell (2008) these were not optical isomers. The detal1, delta2L and delta2R are changes from the baseline control.
data(cushny)
data(cushny)
A data frame with 10 observations on the following 7 variables.
Control
Hours of sleep in a control condition
drug1
Hours of sleep in Drug condition 1
drug2L
Hours of sleep in Drug condition 2
drug2R
Hours of sleep in Drug condition 3 (an isomer of the drug in condition 2
delta1
Change from control, drug 1
delta2L
Change from control, drug 2L
delta2R
Change from control, drug 2R
The original analysis by Student is used as an example for the t-test function, both as a paired t-test and a two group t-test. The data are also useful for a repeated measures analysis of variance.
Cushny, A.R. and Peebles, A.R. (1905) The action of optical isomers: II hyoscines. The Journal of Physiology 32, 501-510.
Student (1908) The probable error of the mean. Biometrika, 6 (1) , 1-25.
See also the data set sleep and the examples for the t.test
S. L. Zabell. On Student's 1908 Article "The Probable Error of a Mean" Journal of the American Statistical Association, Vol. 103, No. 481 (Mar., 2008), pp. 1- 20
data(cushny) with(cushny, t.test(drug1,drug2L,paired=TRUE)) #within subjects psych::error.bars(cushny[1:4],within=TRUE,ylab="Hours of sleep",xlab="Drug condition", main="95% confidence of within subject effects")
data(cushny) with(cushny, t.test(drug1,drug2L,paired=TRUE)) #within subjects psych::error.bars(cushny[1:4],within=TRUE,ylab="Hours of sleep",xlab="Drug condition", main="95% confidence of within subject effects")
A set of handy helper functions to convert data frames or matrices to LaTeX or rtf tables. Although Sweave is the preferred means of converting R output to LaTeX, it is sometimes useful to go directly from a data.frame or matrix to a LaTeX table. cor2latex will find the correlations and then create a lower (or upper) triangular matrix for latex output. cor2rtf will do the same for rtf output. fa2latex and fa2rtf will create the latex commands for showing the loadings and factor intercorrelations. As the default option, tables are prepared in an approximation of APA format.
df2latex(x,digits=2,rowlabels=TRUE,apa=TRUE,short.names=TRUE,font.size ="scriptsize", big.mark=NULL,drop.na=TRUE, heading="A table from the psych package in R", caption="df2latex",label="default", char=FALSE, stars=FALSE,silent=FALSE,file=NULL,append=FALSE,cut=0,big=0,abbrev=NULL,long=FALSE) cor2latex(x,use = "pairwise", method="pearson", adjust="holm",stars=FALSE, digits=2,rowlabels=TRUE,lower=TRUE,apa=TRUE,short.names=TRUE, font.size ="scriptsize", heading="A correlation table from the psych package in R.", caption="cor2latex",label="default",silent=FALSE,file=NULL,append=FALSE,cut=0,big=0) fa2latex(f,digits=2,rowlabels=TRUE,apa=TRUE,short.names=FALSE,cumvar=FALSE, cut=0,big=.3,alpha=.05,font.size ="scriptsize",long=FALSE, heading="A factor analysis table from the psych package in R", caption="fa2latex",label="default",silent=FALSE,file=NULL,append=FALSE) omega2latex(f,digits=2,rowlabels=TRUE,apa=TRUE,short.names=FALSE,cumvar=FALSE,cut=.2, big=.3,font.size ="scriptsize", heading="An omega analysis table from the psych package in R", caption="omega2latex",label="default",silent=FALSE,file=NULL,append=FALSE) irt2latex(f,digits=2,rowlabels=TRUE,apa=TRUE,short.names=FALSE, font.size ="scriptsize", heading="An IRT factor analysis table from R", caption="fa2latex",label="default",silent=FALSE,file=NULL,append=FALSE) ICC2latex(icc,digits=2,rowlabels=TRUE,apa=TRUE,ci=TRUE, font.size ="scriptsize",big.mark=NULL, drop.na=TRUE, heading="A table from the psych package in R", caption="ICC2latex",label="default",char=FALSE,silent=FALSE,file=NULL,append=FALSE) #not all options are yet implemented in these next three functions. df2rtf(x,file=NULL, digits=2,rowlabels=TRUE,width=8.5,old=NULL, apa=TRUE,short.names=TRUE, font.size =10,big.mark=NULL, drop.na=TRUE, heading="A table from the psych package in R", caption="Created with df2rtf",label="default",char=FALSE,stars=FALSE,silent=FALSE, append=FALSE,cut=0,big=.0,abbrev=NULL,long=FALSE) cor2rtf(x,file=NULL, use = "pairwise", method="pearson", adjust="holm", digits=2, rowlabels=TRUE,width=8.5,lower=TRUE,old=NULL, apa=TRUE,short.names=TRUE, font.size =10,big.mark=NULL, drop.na=TRUE, heading="A correlation matrix from the psych package in R", caption="Created with cor2rtf. left justify output if stars", label="default",char=FALSE,stars=FALSE,silent=FALSE, append=FALSE,cut=0,big=.0,abbrev=NULL,long=FALSE) fa2rtf(f,file=NULL, use = "pairwise", method="pearson", adjust="holm", digits=2, rowlabels=TRUE,width=8.5,lower=TRUE,old=NULL, apa=TRUE,short.names=TRUE, font.size =10,big.mark=NULL, drop.na=TRUE, heading="A Factor analysis from the psych package in R", caption="Created with fa2rtf. ",label="default",char=FALSE,silent=FALSE, append=FALSE,cut=0,big=.0,abbrev=NULL)
df2latex(x,digits=2,rowlabels=TRUE,apa=TRUE,short.names=TRUE,font.size ="scriptsize", big.mark=NULL,drop.na=TRUE, heading="A table from the psych package in R", caption="df2latex",label="default", char=FALSE, stars=FALSE,silent=FALSE,file=NULL,append=FALSE,cut=0,big=0,abbrev=NULL,long=FALSE) cor2latex(x,use = "pairwise", method="pearson", adjust="holm",stars=FALSE, digits=2,rowlabels=TRUE,lower=TRUE,apa=TRUE,short.names=TRUE, font.size ="scriptsize", heading="A correlation table from the psych package in R.", caption="cor2latex",label="default",silent=FALSE,file=NULL,append=FALSE,cut=0,big=0) fa2latex(f,digits=2,rowlabels=TRUE,apa=TRUE,short.names=FALSE,cumvar=FALSE, cut=0,big=.3,alpha=.05,font.size ="scriptsize",long=FALSE, heading="A factor analysis table from the psych package in R", caption="fa2latex",label="default",silent=FALSE,file=NULL,append=FALSE) omega2latex(f,digits=2,rowlabels=TRUE,apa=TRUE,short.names=FALSE,cumvar=FALSE,cut=.2, big=.3,font.size ="scriptsize", heading="An omega analysis table from the psych package in R", caption="omega2latex",label="default",silent=FALSE,file=NULL,append=FALSE) irt2latex(f,digits=2,rowlabels=TRUE,apa=TRUE,short.names=FALSE, font.size ="scriptsize", heading="An IRT factor analysis table from R", caption="fa2latex",label="default",silent=FALSE,file=NULL,append=FALSE) ICC2latex(icc,digits=2,rowlabels=TRUE,apa=TRUE,ci=TRUE, font.size ="scriptsize",big.mark=NULL, drop.na=TRUE, heading="A table from the psych package in R", caption="ICC2latex",label="default",char=FALSE,silent=FALSE,file=NULL,append=FALSE) #not all options are yet implemented in these next three functions. df2rtf(x,file=NULL, digits=2,rowlabels=TRUE,width=8.5,old=NULL, apa=TRUE,short.names=TRUE, font.size =10,big.mark=NULL, drop.na=TRUE, heading="A table from the psych package in R", caption="Created with df2rtf",label="default",char=FALSE,stars=FALSE,silent=FALSE, append=FALSE,cut=0,big=.0,abbrev=NULL,long=FALSE) cor2rtf(x,file=NULL, use = "pairwise", method="pearson", adjust="holm", digits=2, rowlabels=TRUE,width=8.5,lower=TRUE,old=NULL, apa=TRUE,short.names=TRUE, font.size =10,big.mark=NULL, drop.na=TRUE, heading="A correlation matrix from the psych package in R", caption="Created with cor2rtf. left justify output if stars", label="default",char=FALSE,stars=FALSE,silent=FALSE, append=FALSE,cut=0,big=.0,abbrev=NULL,long=FALSE) fa2rtf(f,file=NULL, use = "pairwise", method="pearson", adjust="holm", digits=2, rowlabels=TRUE,width=8.5,lower=TRUE,old=NULL, apa=TRUE,short.names=TRUE, font.size =10,big.mark=NULL, drop.na=TRUE, heading="A Factor analysis from the psych package in R", caption="Created with fa2rtf. ",label="default",char=FALSE,silent=FALSE, append=FALSE,cut=0,big=.0,abbrev=NULL)
x |
A data frame or matrix to convert to LaTeX. If non-square, then correlations will be found prior to printing in cor2latex |
digits |
Round the output to digits of accuracy. NULL for formatting character data |
abbrev |
How many characters should be used in column names –defaults to digits + 3 |
rowlabels |
If TRUE, use the row names from the matrix or data.frame |
short.names |
Name the columns with abbreviated rownames to save space |
apa |
If TRUE formats table in APA style |
cumvar |
For factor analyses, should we show the cumulative variance accounted for? |
font.size |
e.g., "scriptsize", "tiny" or anyother acceptable LaTeX font size. |
heading |
The label appearing at the top of the table |
caption |
The table caption |
lower |
in cor2latex, just show the lower triangular matrix |
f |
The object returned from a factor analysis using |
label |
The label for the table |
big.mark |
Comma separate numbers large numbers (big.mark=",") |
drop.na |
Do not print NA values |
method |
When finding correlations, which method should be used (pearson) |
use |
use="pairwise" is the default when finding correlations in cor2latex |
adjust |
If showing probabilities, which adjustment should be used (holm) |
stars |
Should probability 'magic astericks' be displayed in cor2latex (FALSE) |
char |
char=TRUE allows printing tables with character information, but does not allow for putting in commas into numbers |
cut |
In omega2latex, df2latex and fa2latex, do not print abs(values) < cut |
big |
In fa2latex and df2latex boldface those abs(values) > big |
alpha |
If fa has returned confidence intervals, then what values of loadings should be boldfaced? |
icc |
Either the output of an ICC, or the data to be analyzed. |
ci |
Should confidence intervals of the ICC be displayed |
silent |
If TRUE, do not print any output, just return silently – useful if using Sweave |
file |
If specified, write the output to this file |
append |
If file is specified, then should we append (append=TRUE) or just write to the file |
long |
if TRUE, then do long tables. (requires the longtables package in latex) |
old |
When appending output with df2rtf, old is the output from the prior run. |
width |
page width in inches for df2rtf |
A LaTeX table. Note that if showing "stars" for correlations, then one needs to use the siunitx package in LaTex. The entire LaTeX output is also returned invisibly. If using Sweave to create tables, then the silent option should be set to TRUE and the returned object saved as a file. See the last example.
Finally, some users have asked for the ability to convert these output tables into HTML. This may be done using the tth package.
Three functions to write to rtf files (for use in various proprietary word processing languages) have been added with version 2.4.3. These will write to an rtf file and may be formatted directly. df2rtf takes a data frame and writes it as a table with header information.
cor2rtf will take either a data matrix (and find the correlations) or just a correlation matrix. "magic astericks " can be added to the correlations using the stars=TRUE option. In this case, the result table can be left justified in a word processing language to get the numbers to appear correctly justified.
fa2latex and fa2rtf can take the output from either a factor analysis or from fa.lookup.
William Revelle with suggestions from Jason French and David Condon and Davide Morselli
The many LaTeX conversion routines in Hmisc.
To convert these LaTex objects to HTML, you should install the tth package.
Consider the last example for creating HTML
df2latex(psych::Thurstone,rowlabels=FALSE,apa=FALSE,short.names=FALSE, caption="Thurstone Correlation matrix") df2latex(psych::Thurstone,heading="Thurstone Correlation matrix in APA style") df2latex(psych::describe(psych::sat.act)[2:10],short.names=FALSE) cor2latex(psych::Thurstone) cor2latex(psych::sat.act,short.names=FALSE) fa2latex(psych::fa(psych::Thurstone,3),heading="Factor analysis from R in quasi APA style") #to write to rtf file #replace the temporary file name with something more useful fn <- tempfile(pattern="example",fileext=".rtf") #create a temporary file #better is to create a local file # e.g. fn <- "rtf_example.rtf" cor2rtf(sat.act, file=fn) #write to the file dd <- psych::describe(sat.act) temp <- df2rtf(dd, file=fn, append=TRUE, width=12) #write and keep open temp1 <- cor2rtf(sat.act,old=temp,caption=date(), append=TRUE) #use date as caption cor2rtf(sat.act, old=temp1, stars=TRUE) #close the file #now open this with your word processor and reformat with left justify #now write a factor analysis output to an output file # e.g. fn <- "rtf_example.rtf" f5 <- psych::fa(bfi,5) temp <- fa2rtf(f5, width=12, file=fn, append=TRUE) #a normal fa output fl <- psych::fa.lookup(f5, dictionary=bfi.dictionary) fa2rtf(fl, old = temp) ##now open this with your word processor #To convert these latex tables to HTML #f3.lat <- fa2latex(psych::fa(psych::Thurstone,3), # heading="Factor analysis from R in quasi APA style") #library(tth) #f3.ht <- tth(f3.lat) #print(as.data.frame(f3.ht),row.names=FALSE) ### #If using Sweave to create a LateX table as a separate file then set silent=TRUE #e.g., #LaTex preamble #.... #<<print=FALSE,echo=FALSE>>= #f3 <- fa(Thurstone,3) #fa2latex(f3,silent=TRUE,file='testoutput.tex') #@ # #\input{testoutput.tex}
df2latex(psych::Thurstone,rowlabels=FALSE,apa=FALSE,short.names=FALSE, caption="Thurstone Correlation matrix") df2latex(psych::Thurstone,heading="Thurstone Correlation matrix in APA style") df2latex(psych::describe(psych::sat.act)[2:10],short.names=FALSE) cor2latex(psych::Thurstone) cor2latex(psych::sat.act,short.names=FALSE) fa2latex(psych::fa(psych::Thurstone,3),heading="Factor analysis from R in quasi APA style") #to write to rtf file #replace the temporary file name with something more useful fn <- tempfile(pattern="example",fileext=".rtf") #create a temporary file #better is to create a local file # e.g. fn <- "rtf_example.rtf" cor2rtf(sat.act, file=fn) #write to the file dd <- psych::describe(sat.act) temp <- df2rtf(dd, file=fn, append=TRUE, width=12) #write and keep open temp1 <- cor2rtf(sat.act,old=temp,caption=date(), append=TRUE) #use date as caption cor2rtf(sat.act, old=temp1, stars=TRUE) #close the file #now open this with your word processor and reformat with left justify #now write a factor analysis output to an output file # e.g. fn <- "rtf_example.rtf" f5 <- psych::fa(bfi,5) temp <- fa2rtf(f5, width=12, file=fn, append=TRUE) #a normal fa output fl <- psych::fa.lookup(f5, dictionary=bfi.dictionary) fa2rtf(fl, old = temp) ##now open this with your word processor #To convert these latex tables to HTML #f3.lat <- fa2latex(psych::fa(psych::Thurstone,3), # heading="Factor analysis from R in quasi APA style") #library(tth) #f3.ht <- tth(f3.lat) #print(as.data.frame(f3.ht),row.names=FALSE) ### #If using Sweave to create a LateX table as a separate file then set silent=TRUE #e.g., #LaTex preamble #.... #<<print=FALSE,echo=FALSE>>= #f3 <- fa(Thurstone,3) #fa2latex(f3,silent=TRUE,file='testoutput.tex') #@ # #\input{testoutput.tex}
Although order
will order a vector, and it is possible to order several columns of a data.frame by specifying each column individually in the call to order, dfOrder
will order a dataframe or matrix by as many columns as desired. The default is to sort by columns in lexicographic order. If the object is a correlation matrix, then the selected columns are sorted by the (abs) max value across the columns (similar to fa.lookup in psych). If object is a correlation matrix, rows and columns are sorted.
dfOrder(object, columns,absolute=FALSE,ascending=TRUE)
dfOrder(object, columns,absolute=FALSE,ascending=TRUE)
object |
The data.frame or matrix to be sorted |
columns |
Column numbers or names to use for sorting. If positive, then they will be sorted in increasing order. If negative, then in decreasing order |
absolute |
If TRUE, then sort the absolute values |
ascending |
By default, order from smallest to largest. |
This is just a simple helper function to reorder data.frames and correlation matrices. Originally developed to organize IRT output from the ltm package. It is a basic add on to the order function.
(Completely rewritten for version 1.8.1. and then again for 2.2.1 to allow sorting correlation matrices by numeric values.)
The original data frame is now in sorted order. If the input is a correlation matrix, the output is sorted by rows and columns.
William Revelle
Other useful file manipulation functions include read.file
to read in data from a file or read.clipboard
from the clipboard, fileScan
, filesList
, filesInfo
, and fileCreate
dfOrder
code is used in the test.irt
function to combine ltm and sim.irt
output.
#create a data frame and then sort it in lexicographic order set.seed(42) x <- matrix(sample(1:4,64,replace=TRUE),ncol=4) dfOrder(x) # sort by all columns dfOrder(x,c(1,4)) #sort by the first and 4th column x.df <- data.frame(x) dfOrder(x.df,c(1,-2)) #sort by the first in increasing order, #the second in decreasing order #now show sorting correlation matrices r <- cor(sat.act,use="pairwise") r.ord <- dfOrder(r,columns=c("education","ACT"),ascending=FALSE) psych::corPlot(r.ord)
#create a data frame and then sort it in lexicographic order set.seed(42) x <- matrix(sample(1:4,64,replace=TRUE),ncol=4) dfOrder(x) # sort by all columns dfOrder(x,c(1,4)) #sort by the first and 4th column x.df <- data.frame(x) dfOrder(x.df,c(1,-2)) #sort by the first in increasing order, #the second in decreasing order #now show sorting correlation matrices r <- cor(sat.act,use="pairwise") r.ord <- dfOrder(r,columns=c("education","ACT"),ascending=FALSE) psych::corPlot(r.ord)
Marco Del Giudice criticized an earlier study by Simonton for using partial regression weights to estimate the importance of various predictors of rated eminence. This is a nice example of the (mis)interpretation of beta weights of highly correlated predictors.
data("eminence")
data("eminence")
A data frame with 69 observations on the following 9 variables.
name
a character vector
reputation
Log of rated reputation
birth.year
Year of birth
first.year
Year of first cited publicatin
last.year
Year of last cited publication
works
Log of number of publications
citations
Log of number of citations
composite
A composite index of publications
h
The 'h' index of citations
Simonton (1997, 2014) discusses various estimates of eminence among 69 psychologists born between 1842 and 1912 and reports that the regression weights are small and interprets this as meaning number of publications and citations are not very important. Del Giudice (2020) points out that citations and the number of publications are highly collinear and thus while their independent contributions are small, their joint effect is quite large (R= .69 ). These data are given here as an example of multiple correlation and partial correlation
Del Giudice (2020) links to a web page with the data.
Marco Del Giudice (2020). How Well Do Bibliometric Indicators Correlate With Scientific Eminence? A Comment on Simonton (2016). Perspective in Psychological Science, 15, 202-203.
Simonton, D. K. (1992). Leaders of American psychology, 1879-1967: Career development, creative output, and professional achievement. Journal of Personality and Social Psychology, 62, 5-17.
Simonton, D. K. (2016). Giving credit where credit is due: Why it's so hard to do in psychological science. Perspectives on Psychological Science, 11, 888-892.
data(eminence) psych::lowerCor(eminence) cs <- psych::cs psych::partial.r(eminence, x= cs(reputation, works, citations),y=cs(birth.year)) psych::setCor(reputation ~ works + h + first.year,data=eminence)
data(eminence) psych::lowerCor(eminence) cs <- psych::cs psych::partial.r(eminence, x= cs(reputation, works, citations),y=cs(birth.year)) psych::setCor(reputation ~ works + h + first.year,data=eminence)
The EPI is and has been a very frequently administered personality test with 57 measuring two broad dimensions, Extraversion-Introversion and Stability-Neuroticism, with an additional Lie scale. Developed by Eysenck and Eysenck, 1964. Eventually replaced with the EPQ which measures three broad dimensions. This data set represents 3570 observations collected in the early 1990s at the Personality, Motivation and Cognition lab at Northwestern. An additional data set (epiR) has test and retest information for 474 participants. The data are included here as demonstration of scale construction and test-retest reliability.
data(epi) data(epi.dictionary) data(epiR)
data(epi) data(epi.dictionary) data(epiR)
A data frame with 3570 observations on the following 57 variables.
id
The identification number within the study
time
First (group testing) or 2nd time (before a lab experiment) for the epiR data set.
study
Four lab based studies and their pretest data
V1
a numeric vector
V2
a numeric vector
V3
a numeric vector
V4
a numeric vector
V5
a numeric vector
V6
a numeric vector
V7
a numeric vector
V8
a numeric vector
V9
a numeric vector
V10
a numeric vector
V11
a numeric vector
V12
a numeric vector
V13
a numeric vector
V14
a numeric vector
V15
a numeric vector
V16
a numeric vector
V17
a numeric vector
V18
a numeric vector
V19
a numeric vector
V20
a numeric vector
V21
a numeric vector
V22
a numeric vector
V23
a numeric vector
V24
a numeric vector
V25
a numeric vector
V26
a numeric vector
V27
a numeric vector
V28
a numeric vector
V29
a numeric vector
V30
a numeric vector
V31
a numeric vector
V32
a numeric vector
V33
a numeric vector
V34
a numeric vector
V35
a numeric vector
V36
a numeric vector
V37
a numeric vector
V38
a numeric vector
V39
a numeric vector
V40
a numeric vector
V41
a numeric vector
V42
a numeric vector
V43
a numeric vector
V44
a numeric vector
V45
a numeric vector
V46
a numeric vector
V47
a numeric vector
V48
a numeric vector
V49
a numeric vector
V50
a numeric vector
V51
a numeric vector
V52
a numeric vector
V53
a numeric vector
V54
a numeric vector
V55
a numeric vector
V56
a numeric vector
V57
a numeric vector
The original data were collected in a group testing framework for screening participants for subsequent studies. The participants were enrolled in an introductory psychology class between Fall, 1991 and Spring, 1995.
The actual items may be found in the epi.dictionary
.
The structure of the E scale has been shown by Rocklin and Revelle (1981) to have two subcomponents, Impulsivity and Sociability. These were subsequently used by Revelle, Humphreys, Simon and Gilliland (1980) to examine the relationship between personality, caffeine induced arousal, and cognitive performance.
The epiR data include the original group testing data and matched data for 474 participants collected several weeks later. This is useful for showing that internal consistency estimates (e.g. alpha
or omega
) can be low even though the test is stable across time. For more demonstrations of the distinction between immediate internal consistency and delayed test-retest reliability see the msqR
and sai
data sets and testRetest
.
Data from the PMC laboratory at Northwestern.
Eysenck, H.J. and Eysenck, S. B.G. (1968). Manual for the Eysenck Personality Inventory.Educational and Industrial Testing Service, San Diego, CA.
Revelle, W. and Humphreys, M. S. and Simon, L. and Gilliland, K. (1980) Interactive effect of personality, time of day, and caffeine: A test of the arousal model, Journal of Experimental Psychology General, 109, 1, 1-31,
data(epi) epi.keys <- list(E = c("V1", "V3", "V8", "V10", "V13", "V17", "V22", "V25", "V27", "V39", "V44", "V46", "V49", "V53", "V56", "-V5", "-V15", "-V20", "-V29", "-V32", "-V34","-V37", "-V41", "-V51"), N = c( "V2", "V4", "V7", "V9", "V11", "V14", "V16", "V19", "V21", "V23", "V26", "V28", "V31", "V33", "V35", "V38", "V40","V43", "V45", "V47", "V50", "V52","V55", "V57"), L = c("V6", "V24", "V36", "-V12", "-V18", "-V30", "-V42", "-V48", "-V54"), Imp = c( "V1", "V3", "V8", "V10", "V13", "V22", "V39", "-V5", "-V41"), Soc = c( "V17", "V25", "V27", "V44", "V46", "V53", "-V11", "-V15", "-V20", "-V29", "-V32", "-V37", "-V51") ) scores <- psych::scoreItems(epi.keys,epi) psych::keys.lookup(epi.keys[1:3],epi.dictionary) #show the items and keying information #a variety of demonstrations (not run) of test retest reliability versus alpha versus omega E <- psych::selectFromKeys(epi.keys$E) #look at the testRetest help file for more examples
data(epi) epi.keys <- list(E = c("V1", "V3", "V8", "V10", "V13", "V17", "V22", "V25", "V27", "V39", "V44", "V46", "V49", "V53", "V56", "-V5", "-V15", "-V20", "-V29", "-V32", "-V34","-V37", "-V41", "-V51"), N = c( "V2", "V4", "V7", "V9", "V11", "V14", "V16", "V19", "V21", "V23", "V26", "V28", "V31", "V33", "V35", "V38", "V40","V43", "V45", "V47", "V50", "V52","V55", "V57"), L = c("V6", "V24", "V36", "-V12", "-V18", "-V30", "-V42", "-V48", "-V54"), Imp = c( "V1", "V3", "V8", "V10", "V13", "V22", "V39", "-V5", "-V41"), Soc = c( "V17", "V25", "V27", "V44", "V46", "V53", "-V11", "-V15", "-V20", "-V29", "-V32", "-V37", "-V51") ) scores <- psych::scoreItems(epi.keys,epi) psych::keys.lookup(epi.keys[1:3],epi.dictionary) #show the items and keying information #a variety of demonstrations (not run) of test retest reliability versus alpha versus omega E <- psych::selectFromKeys(epi.keys$E) #look at the testRetest help file for more examples
A small data set of 5 scales from the Eysenck Personality Inventory, 5 from a Big 5 inventory, a Beck Depression Inventory, and State and Trait Anxiety measures. Used for demonstrations of correlations, regressions, graphic displays.
data(epi.bfi)
data(epi.bfi)
A data frame with 231 observations on the following 13 variables.
epiE
EPI Extraversion
epiS
EPI Sociability (a subset of Extraversion items
epiImp
EPI Impulsivity (a subset of Extraversion items
epilie
EPI Lie scale
epiNeur
EPI neuroticism
bfagree
Big 5 inventory (from the IPIP) measure of Agreeableness
bfcon
Big 5 Conscientiousness
bfext
Big 5 Extraversion
bfneur
Big 5 Neuroticism
bfopen
Big 5 Openness
bdi
Beck Depression scale
traitanx
Trait Anxiety
stateanx
State Anxiety
Self report personality scales tend to measure the “Giant 2" of Extraversion and Neuroticism or the “Big 5" of Extraversion, Neuroticism, Agreeableness, Conscientiousness, and Openness. Here is a small data set from Northwestern University undergraduates with scores on the Eysenck Personality Inventory (EPI) and a Big 5 inventory taken from the International Personality Item Pool.
Data were collected at the Personality, Motivation, and Cognition Lab (PMCLab) at Northwestern by William Revelle)
https://personality-project.org/pmc.html
data(epi.bfi) psych::pairs.panels(epi.bfi[,1:5]) psych::describe(epi.bfi)
data(epi.bfi) psych::pairs.panels(epi.bfi[,1:5]) psych::describe(epi.bfi)
Two of the earliest examples of the correlation coefficient were Francis Galton's data sets on the relationship between mid parent and child height and the similarity of parent generation peas with child peas. This is the data set for the Galton height.
data(galton)
data(galton)
A data frame with 928 observations on the following 2 variables.
parent
Mid Parent heights (in inches)
child
Child Height
Female heights were adjusted by 1.08 to compensate for sex differences. (This was done in the original data set)
This is just the galton data set from UsingR, slightly rearranged.
Stigler, S. M. (1999). Statistics on the Table: The History of Statistical Concepts and Methods. Harvard University Press. Galton, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute of Great Britain and Ireland, 15:246-263. Galton, F. (1869). Hereditary Genius: An Inquiry into its Laws and Consequences. London: Macmillan.
Wachsmuth, A.W., Wilkinson L., Dallal G.E. (2003). Galton's bend: A previously undiscovered nonlinearity in Galton's family stature regression data. The American Statistician, 57, 190-192.
The other Galton data sets: heights
, peas
,cubits
data(galton) psych::describe(galton) #show the scatter plot and the lowess fit psych::pairs.panels(galton,main="Galton's Parent child heights") #but this makes the regression lines look the same psych::pairs.panels(galton,lm=TRUE,main="Galton's Parent child heights") #better is to scale them psych::pairs.panels(galton,lm=TRUE,xlim=c(62,74),ylim=c(62,74), main="Galton's Parent child heights")
data(galton) psych::describe(galton) #show the scatter plot and the lowess fit psych::pairs.panels(galton,main="Galton's Parent child heights") #but this makes the regression lines look the same psych::pairs.panels(galton,lm=TRUE,main="Galton's Parent child heights") #better is to scale them psych::pairs.panels(galton,lm=TRUE,xlim=c(62,74),ylim=c(62,74), main="Galton's Parent child heights")
Gruber et al. (2020) report on the psychometric properties of a multifaceted Gender Related Attributes Survey. Here are the data from their 3 domains (Personality, Cognition and Activities and Interests from their study 2. Eagly and Revelle (2022) include these data in their review of the power of aggregation. The data are included here as demonstrations of the cohen.d
and scatterHist
functions in the psych package and may be used to show the power of aggregation.
data("GERAS") #These other objects are included in the file # data("GERAS.scales") # data("GERAS.dictionary") # data("GERAS.items") # data("GERAS.keys")
data("GERAS") #These other objects are included in the file # data("GERAS.scales") # data("GERAS.dictionary") # data("GERAS.items") # data("GERAS.keys")
A data frame with 471 observations on the following 51 variables (selected from the original 93) The code numbers are item numbers from the bigger set.
V15
reckless
V22
willing to take risks
V11
courageous
V6
a adventurous
V19
dominant
V14
controlling
V20
boastful
V21
rational
V23
analytical
V9
pragmatic
V44
to find an address for the first time
V45
to find a way again
V46
to understand equations
V50
to follow directions
V51
to understand equations
V53
day-to-day calculations
V48
to write a computer program
V69
paintball
V73
driving go-cart
V71
drinking beer
V68
watching action movies
V75
playing cards (poker)
V72
watching sports on TV
V67
doing certain sports (e.g. soccer, ...)
V74
Gym (weightlifting)
V27
warm-hearted
V28
loving
V29
caring
V26
compassionate
V32
delicate
V30
tender
V24
familiy-oriented
V40
anxious
V39
thin-skinned
V41
careful
V55
to explain foreign words
V58
to find the right words to express certain content
V59
synonyms for a word in order to avoid repetitions
V60
to phrase a text
V54
remembering events from your own life
V63
to notice small changes
V57
to remember names and faces
V89
shopping
V92
gossiping
V81
watching a romantic movie
V80
talking on the phone with a friend
V90
yoga
V83
rhythmic gymnastics
V84
going for a walk
V86
dancing
gender
gender (M=1 F=2)
These 50 items (+ gender) may be formed into scales using the GERAS.keys The first 10 items are Male Personality, the next 10 are Female Personality, then 7 and 7 M and F Cognition, then 8 and 8 M and F Activity items. The Pers, Cog and Act scales are formed from the M-F scales for the three domains. M and F are the composites of the Male and then the Female scales. MF.all is the composite of the M - F scales. See the GERAS.keys object for scoring directions.
"M.pers" "F.pers" "M.cog" "F.cog" "M.act" "F.act" "Pers" "Cog" "Act" "M" "F" "MF.all" "gender"
See the Athenstaedt
data set for a related data set.
Study 2 data downloaded from the Open Science Framework https://osf.io/42jhr/ Used by kind permission of Freya M. Gruber, Tullia Ortner, and Belinda A. Pletzer.
Alice H. Eagly and William Revelle (2022), Understanding the Magnitude of Psychological Differences Between Women and Men Requires Seeing the Forest and the Tree. Perspectives in Psychological Science doi:10.1177/17456916211046006
Gruber, Freya M. and Distlberger, Eva and Scherndl, Thomas and Ortner, Tuulia M. and Pletzer, Belinda (2020) Psychometric properties of the multifaceted Gender-Related Attributes Survey (GERAS) European Journal of Psychological Assessment, 36, (4) 612-623.
data(GERAS) GERAS.keys #show the keys #show the items from the dictionary psych::lookupFromKeys(GERAS.keys, GERAS.dictionary[,4,drop=FALSE]) #now, use the GERAS.scales to show a scatterHist plot showing univariate d and bivariate # Mahalanobis D. psych::scatterHist(F ~ M + gender, data=GERAS.scales, cex.point=.3,smooth=FALSE, xlab="Masculine Scale",ylab="Feminine Scale",correl=FALSE, d.arrow=TRUE,col=c("red","blue"), bg=c("red","blue"), lwd=4, title="Combined M and F scales",cex.cor=2,cex.arrow=1.25, cex.main=2)
data(GERAS) GERAS.keys #show the keys #show the items from the dictionary psych::lookupFromKeys(GERAS.keys, GERAS.dictionary[,4,drop=FALSE]) #now, use the GERAS.scales to show a scatterHist plot showing univariate d and bivariate # Mahalanobis D. psych::scatterHist(F ~ M + gender, data=GERAS.scales, cex.point=.3,smooth=FALSE, xlab="Masculine Scale",ylab="Feminine Scale",correl=FALSE, d.arrow=TRUE,col=c("red","blue"), bg=c("red","blue"), lwd=4, title="Combined M and F scales",cex.cor=2,cex.arrow=1.25, cex.main=2)
Erik Nisbet reported the relationship between emotions, ideology, and party affiliation as predictors of attitudes towards government action on climate change. The data were used by Hayes (2013) in a discussion of regression. They are available as the glbwarm data set in the processR package. They are copied here for examples of mediation.
data("globalWarm")
data("globalWarm")
A data frame with 815 observations on the following 7 variables.
govact
Support for govermment action
posemot
Positive emotions about climate change
negemot
Negative emotions about climate change
ideology
Political ideology (Liberal to conservative)
age
age
sex
female =0, male =1
partyid
Democratic =1, Independent =2, Republican =3
This data set is discussed as an example of regression in Hayes (2013) p 24 - 30 and elsewhere. It is a nice example of moderated regression. It was collected by Erik Nisbet (no citation) who studies communication and the media. E. Nisbet is currently on the faculty at Northwestern School of Communication.
The raw data are available from the processR package (Keon-Woong Moon, 2020) as the glbwarm data set as well as from Hayes' website. The data set is used by Hayes in several examples. Used here by kind permission of Erik Nisbet.
Although the processR package has been removed from CRAN, an earlier version had the data.
Hayes, Andrew F. (2013) Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford Press.
Moon K (2023). processR: Implementation of the 'PROCESS' Macro_. R package version 0.2.8,
data(globalWarm) psych::lowerCor(globalWarm) #compare to Hayes p 254-258 psych::lmCor(govact ~ negemot * age + posemot +ideology+sex,data=globalWarm,std=FALSE)
data(globalWarm) psych::lowerCor(globalWarm) #compare to Hayes p 254-258 psych::lmCor(govact ~ negemot * age + posemot +ideology+sex,data=globalWarm,std=FALSE)
Francis Galton introduced the 'co-relation' in 1888 with a paper discussing how to measure the relationship between two variables. His primary example was the relationship between height and forearm length. The data table (cubits
) is taken from Galton (1888). Unfortunately, there seem to be some errors in the original data table in that the marginal totals do not match the table.
The data frame, heights
, is converted from this table using table2df
.
data(heights)
data(heights)
A data frame with 348 observations on the following 2 variables.
height
Height in inches
cubit
Forearm length in inches
Sir Francis Galton (1888) published the first demonstration of the correlation coefficient. The regression (or reversion to mediocrity) of the height to the length of the left forearm (a cubit) was found to .8. The original table cubits
is taken from Galton (1888). There seem to be some errors in the table as published in that the row sums do not agree with the actual row sums. These data are used to create a matrix using table2matrix
for demonstrations of analysis and displays of the data.
Galton (1888)
Galton, Francis (1888) Co-relations and their measurement. Proceedings of the Royal Society. London Series,45,135-145,
table2matrix
, table2df
, cubits
, ellipses
, galton
data(heights) psych::ellipses(heights,n=1,main="Galton's co-relation data set")
data(heights) psych::ellipses(heights,n=1,main="Galton's co-relation data set")
A classic data set in psychometrics is that from Holzinger and Swineford (1939). A 4 and 5 factor solution to 24 of these variables problem is presented by Harman (1976), and 9 of these are used by the lavaan package. The two data sets were supplied by Keith Widaman.
data(holzinger.swineford) data(holzinger.raw) data(holzinger.dictionary)
data(holzinger.swineford) data(holzinger.raw) data(holzinger.dictionary)
A data frame with 301 observations on the following 33 variables. Longer descriptions taken from Thompson, (1998).
case
a numeric vector
school
School Pasteur or Grant-White
grade
Grade (7 or 8)
female
male = 1, female = 2
ageyr
age in years
mo
months over year
agemo
Age in months
t01_visperc
Visual perception test from Spearman VPT Part I
t02_cubes
Cubes, Simplification of Brighams Spatial Relations Test
t03_frmbord
Paper formboard-Shapes that can be combined to form a target
t04_lozenges
Lozenges from Thorndike-Shapes flipped over then identify target
t05_geninfo
General Information Verbal Test
t06_paracomp
Paragraph Comprehension Test
t07_sentcomp
Sentence Completion Test
t08_wordclas
Word clasification-Which word not belong in set
t09_wordmean
Word Meaning Test
t10_addition
Speeded addition test
t11_code
Speeded codetest-Transform shapes into alpha with code
t12_countdot
Speeded counting of dots in shap
t13_sccaps
Speeded discrimation of straight and curved caps
t14_wordrecg
Memory of Target Words
t15_numbrecg
Memory of Target Numbers
t16_figrrecg
Memory of Target Shapes
t17_objnumb
Memory of object-Number association targets
t18_numbfig
Memory of number-Object association targets
t19_figword
Memory of figure-Word association target
t20_deduction
Deductive Math Ability
t21_numbpuzz
Math number puzzles
t22_probreas
Math word problem reasoning
t23_series
Completion of a Math Number Series
t24_woody
Woody-McCall mixed math fundamentals test
t25_frmbord2
Revision of t3-Paper form board
t26_flags
Flags-possible substitute for t4 lozenges
The following commentary was provided by Keith Widaman:
“The Holzinger and Swineford (1939) data have been used as a model data set by many investigators. For example, Harman (1976) used the “24 Psychological Variables" example prominently in his authoritative text on multiple factor analysis, and the data presented under this rubric consisted of 24 of the variables from the Grant-White school (N = 145). Meredith (1964a, 1964b) used several variables from the Holzinger and Swineford study in his work on factorial invariance under selection. Joreskog (1971) based his work on multiple-group confirmatory factor analysis using the Holzinger and Swineford data, subsetting the data into four groups.
Rosseel, who developed the ‘lavaan’ package for R, included 9 of the manifest variables from Holzinger and Swineford (1939) as a “resident" data set when one downloads the ‘lavaan’ package. Several background variables are included in this “resident" data set in addition to 9 of the psychological tests (which are named x1 – x9 in the data set). When analyzing these data, I found the distributions of the variables (means, SDs) did not match the sample statistics from the original article. For example, in the “resident" data set in ‘lavaan’, scores on all manifest variables ranged between 0 and 10, sample means varied between 3 and 6, and sample SDs varied between 1.0 and 1.5. In the original data set, scores ranges were rather different across tests, with some variables having scores that ranged between 0 and 20, but other manifest variables having scores ranging from 50 to over 300 – with obvious attendant differences in sample means and SDs.
After a bit of snooping (i.e., data analysis), I discovered that the 9 variables in the “resident" data set in ‘lavaan’ had been rescored through ratio transformations. The ratio transformations involved dividing the raw score for each person on a given test by a particular constant for that test that transformed scores on the test to have the desired range.
I decided to perform transformations of all 26 variables so that two data sets could be available to interested researchers:"
holzinger.raw are the raws scores on all variables from Holzinger & Swineford (1939)
holzinger.swineford are rescaled scores on all variables from Holzinger & Swineford.
holzinger.dictionary is a list of the variable names in short and long form.
... Widaman continues:
“As several persons have noted, Harman (1976) used data only from the Grant-White school (N = 145) for his 24 Psychological Variables data set. In doing so, Harman replaced t03_frmbord and t04_lozenges with t25_frmbord2 and t26_flags, because the latter two tests were experimental tests that were designed to be more appropriate for this age level. This substitution is fine, as long as one analyzes data from only the Grant- White school. If one wishes to perform multiple-group analyses and uses school as a grouping variable (as Meredith, 1964a, 1964b, and Joreskog, 1971, did), then tests 25 and 26 should not be used."
“As have others, Gorsuch (1983) mentioned that analyses based on the raw data reported by Holzinger and Swineford (1939) will not produce statistics (means, SDs, correlations) that match precisely the values reported by Holzinger and Swineford or Harman (1976). Following Gorsuch, I have assumed that the raw data are correct. Applying factor analytic techniques to the raw data from the Grant-White school and to the summary data reported by Harman (1976) will produce slightly different results, but results that differ in only minor, unimportant details."
These data are interesting not just for the historical completeness of having the original data, but also as an example of suppressor variables. Age and grade are positively correlated, and scores are higher in the 8th grade than in the 7th grade. But age (particularly in months) is negatively correlated with many of the cognitive tasks, and when grade and age are both entered into regression, this negative correlation is enhanced. That is, although increasing grade increases cognitive performance, younger children in both grades do better than the older children.
As discussed by Widaman, the descriptive values reported in Harman (1967) (p 124) do not quite match the descriptive statistics in holzinger.raw
. Further note that the correlation matrix and factor loadings are trivially different from the Harman.24 factor loadings in the GPA rotation package.
The purpose behind presenting both the raw and transformed data is to show that the fit statistics from factor analysis are identical for these two data sets.
The variables v1 ... v9 in the lavaan package correspond to tests 1, 2, 4, 6, 7, 9, 10, 12 and 13.
Keith Widaman (2019, personal communication). Original data from Holzinger and Swineford (1939).
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.
Harman, Harry Horace (1967), Modern factor analysis. Chicago, University of Chicago Press.
Holzinger, K. J., & Swineford, F. (1939). A study in factor analysis: The stability of a bi-factor solution. Supplementary Educational Monographs, no. 48. Chicago: University of Chicago, Department of Education.
Joreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36, 409-426.
Meredith, W. (1964a). Notes on factorial invariance. Psychometrika, 29, 177-185.
Meredith, W. (1964b). Rotation to achieve factorial invariance. Psychometrika, 29, 177-206.
Meredith, W. (1977). On weighted Procrustes and hyperplane fitting in factor analytic rotation. Psychometrika, 42, 491-522.
Thompson, Bruce. Five Methodology Errors in Educational Research:The Pantheon of Statistical Significance and Other Faux Pas. Paper presented at the Annual Meeting of the American Educational Research Association(San Diego, CA, April 13-17,1998)
psych::Holzinger
data(holzinger.raw) psych::describe(holzinger.raw) data(holzinger.dictionary) holzinger.dictionary #to see the longer names for these data (taken from Thompson) #Compare these to the lavaan correlation matrix psych::lowerCor(holzinger.swineford[ 7+ c(1, 2, 4, 6, 7, 9, 10, 12, 13)]) psych::lmCor(t01_visperc + t05_geninfo + t08_wordclas ~ grade + agemo,data = holzinger.raw) psych::lmCor( t06_paracomp ~ grade + agemo, data=holzinger.swineford) psych::mediate(t06_paracomp ~ grade + (agemo),data = holzinger.raw,std=TRUE) #show the omega structure of the 24 variables om4 <- psych::omega(holzinger.swineford[8:31],4) psych::omega.diagram(om4,sl=FALSE,main="26 variables from Holzinger-Swineford") #these data also show an interesting suppression effect psych::lowerCor(holzinger.swineford[c(3,7,12:14)]) psych::lmCor( t06_paracomp ~ grade + agemo, data=holzinger.swineford) #or show as a mediation effect mod <- psych::mediate(t06_paracomp ~ grade + (agemo),data = holzinger.raw,std=TRUE,n.iter=50) summary(mod) #now, show a plot of these effets plot(t07_sentcomp ~ agemo, col=c("red","blue")[holzinger.swineford$grade -6], pch=26-holzinger.swineford$grade,data=holzinger.swineford, ylab="Sentence Comprehension",xlab="Age in Months", main="Sentence Comprehension varies by age and grade") #we use lmCor to figure out the lines #note that we need to not plot the default graph by(holzinger.swineford,holzinger.swineford$grade -6,function(x) abline( psych::lmCor(t07_sentcomp ~ agemo, data=x, std=FALSE, plot=FALSE), lty=c("dashed","solid")[x$grade-6])) text(190,3.3,"grade = 8") text(190,2,"grade = 7")
data(holzinger.raw) psych::describe(holzinger.raw) data(holzinger.dictionary) holzinger.dictionary #to see the longer names for these data (taken from Thompson) #Compare these to the lavaan correlation matrix psych::lowerCor(holzinger.swineford[ 7+ c(1, 2, 4, 6, 7, 9, 10, 12, 13)]) psych::lmCor(t01_visperc + t05_geninfo + t08_wordclas ~ grade + agemo,data = holzinger.raw) psych::lmCor( t06_paracomp ~ grade + agemo, data=holzinger.swineford) psych::mediate(t06_paracomp ~ grade + (agemo),data = holzinger.raw,std=TRUE) #show the omega structure of the 24 variables om4 <- psych::omega(holzinger.swineford[8:31],4) psych::omega.diagram(om4,sl=FALSE,main="26 variables from Holzinger-Swineford") #these data also show an interesting suppression effect psych::lowerCor(holzinger.swineford[c(3,7,12:14)]) psych::lmCor( t06_paracomp ~ grade + agemo, data=holzinger.swineford) #or show as a mediation effect mod <- psych::mediate(t06_paracomp ~ grade + (agemo),data = holzinger.raw,std=TRUE,n.iter=50) summary(mod) #now, show a plot of these effets plot(t07_sentcomp ~ agemo, col=c("red","blue")[holzinger.swineford$grade -6], pch=26-holzinger.swineford$grade,data=holzinger.swineford, ylab="Sentence Comprehension",xlab="Age in Months", main="Sentence Comprehension varies by age and grade") #we use lmCor to figure out the lines #note that we need to not plot the default graph by(holzinger.swineford,holzinger.swineford$grade -6,function(x) abline( psych::lmCor(t07_sentcomp ~ agemo, data=x, std=FALSE, plot=FALSE), lty=c("dashed","solid")[x$grade-6])) text(190,3.3,"grade = 8") text(190,2,"grade = 7")
US census data on family income from 2008
data(income)
data(income)
A data frame with 44 observations on the following 4 variables.
value
lower boundary of the income group
count
Number of families within that income group
mean
Mean of the category
prop
proportion of families
The distribution of income is a nice example of a log normal distribution. It is also an interesting example of the power of graphics. It is quite clear when graphing the data that income statistics are bunched to the nearest 5K. That is, there is a clear sawtooth pattern in the data.
The all.income set is interpolates intervening values for 100-150K, 150-200K and 200-250K
US Census: Table HINC-06. Income Distribution to $250,000 or More for Households: 2008
https://www.census.gov/hhes/www/cpstables/032009/hhinc/new06_000.htm
data(income) with(income[1:40,], plot(mean,prop, main="US family income for 2008",xlab="income", ylab="Proportion of families",xlim=c(0,100000))) with (income[1:40,], points(lowess(mean,prop,f=.3),typ="l")) psych::describe(income) with(all.income, plot(mean,prop, main="US family income for 2008",xlab="income", ylab="Proportion of families",xlim=c(0,250000))) with (all.income[1:50,], points(lowess(mean,prop,f=.25),typ="l"))
data(income) with(income[1:40,], plot(mean,prop, main="US family income for 2008",xlab="income", ylab="Proportion of families",xlim=c(0,100000))) with (income[1:40,], points(lowess(mean,prop,f=.3),typ="l")) psych::describe(income) with(all.income, plot(mean,prop, main="US family income for 2008",xlab="income", ylab="Proportion of families",xlim=c(0,250000))) with (all.income[1:50,], points(lowess(mean,prop,f=.25),typ="l"))
16 multiple choice ability items taken from the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project. The data from 1525 subjects are included here as a demonstration set for scoring multiple choice inventories and doing basic item statistics. For more information on the development of an open source measure of cognitive ability, consult the readings available at the https://personality-project.org/.
data(iqitems)
data(iqitems)
A data frame with 1525 observations on the following 16 variables. The number following the name is the item number from SAPA.
reason.4
Basic reasoning questions
reason.16
Basic reasoning question
reason.17
Basic reasoning question
reason.19
Basic reasoning question
letter.7
In the following alphanumeric series, what letter comes next?
letter.33
In the following alphanumeric series, what letter comes next?
letter.34
In the following alphanumeric series, what letter comes next
letter.58
In the following alphanumeric series, what letter comes next?
matrix.45
A matrix reasoning task
matrix.46
A matrix reasoning task
matrix.47
A matrix reasoning task
matrix.55
A matrix reasoning task
rotate.3
Spatial Rotation of type 1.2
rotate.4
Spatial Rotation of type 1.2
rotate.6
Spatial Rotation of type 1.1
rotate.8
Spatial Rotation of type 2.3
16 items were sampled from 80 items given as part of the SAPA (https://www.sapa-project.org/) project (Revelle, Wilt and Rosenthal, 2009; Condon and Revelle, 2014) to develop online measures of ability. These 16 items reflect four lower order factors (verbal reasoning, letter series, matrix reasoning, and spatial rotations. These lower level factors all share a higher level factor ('g'). Similar data are available from the International Cognitive Abiity Resource at https://www.icar-project.org/ .
This data set and the associated data set (ability
based upon scoring these multiple choice items and converting them to correct/incorrect may be used to demonstrate item response functions, tetrachoric
correlations, or irt.fa
as well as omega
estimates of of reliability and hierarchical structure.
In addition, the data set is a good example of doing item analysis to examine the empirical response probabilities of each item alternative as a function of the underlying latent trait. When doing this, it appears that two of the matrix reasoning problems do not have monotonically increasing trace lines for the probability correct. At moderately high ability (theta = 1) there is a decrease in the probability correct from theta = 0 and theta = 2.
The example data set is taken from the Synthetic Aperture Personality Assessment personality and ability test at https://www.sapa-project.org/. The data were collected with David Condon from 8/08/12 to 8/31/12.
Condon, David and Revelle, William, (2014) The International Cognitive Ability Resource: Development and initial validation of a public-domain measure. Intelligence, 43, 52-64.
Revelle, William, Dworak, Elizabeth M. and Condon, David (2020) Cognitive ability in everyday life: the utility of open-source measures. Current Directions in Psychological Science, 29, (4) 358-363. Open access at doi:10.1177/0963721420922178.
Dworak, Elizabeth M., Revelle, William, Doebler, Philip and Condon, David (2021) Using the International Cognitive Ability Resource as an open source tool to explore individual differences in cognitive ability. Personality and Individual Differences, 169. Open access at doi:10.1016/j.paid.2020.109906.
Revelle, W., Wilt, J., and Rosenthal, A. (2010) Individual Differences in Cognition: New Methods for examining the Personality-Cognition Link In Gruszka, A. and Matthews, G. and Szymura, B. (Eds.) Handbook of Individual Differences in Cognition: Attention, Memory and Executive Control, Springer.
Revelle, W, Condon, D.M., Wilt, J., French, J.A., Brown, A., and Elleman, L.G. (2016) Web and phone based data collection using planned missing designs. In Fielding, N.G., Lee, R.M. and Blank, G. (Eds). SAGE Handbook of Online Research Methods (2nd Ed), Sage Publcations.
data(iqitems) iq.keys <- c(4,4,4, 6, 6,3,4,4, 5,2,2,4, 3,2,6,7) psych::score.multiple.choice(iq.keys,iqitems) #this just gives summary statisics #convert them to true false iq.scrub <- psych::scrub(iqitems,isvalue=0) #first get rid of the zero responses iq.tf <- psych::score.multiple.choice(iq.keys,iq.scrub,score=FALSE) #convert to wrong (0) and correct (1) for analysis psych::describe(iq.tf) #see the ability data set for these analyses #now, for some item analysis iq.irt <- psych::irt.fa(iq.tf) #do a basic irt iq.sc <- psych::scoreIrt(iq.irt,iq.tf) #find the scores op <- par(mfrow=c(4,4)) psych::irt.responses(iq.sc[,1], iq.tf) op <- par(mfrow=c(1,1))
data(iqitems) iq.keys <- c(4,4,4, 6, 6,3,4,4, 5,2,2,4, 3,2,6,7) psych::score.multiple.choice(iq.keys,iqitems) #this just gives summary statisics #convert them to true false iq.scrub <- psych::scrub(iqitems,isvalue=0) #first get rid of the zero responses iq.tf <- psych::score.multiple.choice(iq.keys,iq.scrub,score=FALSE) #convert to wrong (0) and correct (1) for analysis psych::describe(iq.tf) #see the ability data set for these analyses #now, for some item analysis iq.irt <- psych::irt.fa(iq.tf) #do a basic irt iq.sc <- psych::scoreIrt(iq.irt,iq.tf) #find the scores op <- par(mfrow=c(4,4)) psych::irt.responses(iq.sc[,1], iq.tf) op <- par(mfrow=c(1,1))
Emotions may be described either as discrete emotions or in dimensional terms. The Motivational State Questionnaire (MSQ) was developed to study emotions in laboratory and field settings. The data can be well described in terms of a two dimensional solution of energy vs tiredness and tension versus calmness. Additional items include what time of day the data were collected and a few personality questionnaire scores.
data(msq)
data(msq)
A data frame with 3896 observations on the following 92 variables.
active
a numeric vector
afraid
a numeric vector
alert
a numeric vector
angry
a numeric vector
anxious
a numeric vector
aroused
a numeric vector
ashamed
a numeric vector
astonished
a numeric vector
at.ease
a numeric vector
at.rest
a numeric vector
attentive
a numeric vector
blue
a numeric vector
bored
a numeric vector
calm
a numeric vector
cheerful
a numeric vector
clutched.up
a numeric vector
confident
a numeric vector
content
a numeric vector
delighted
a numeric vector
depressed
a numeric vector
determined
a numeric vector
distressed
a numeric vector
drowsy
a numeric vector
dull
a numeric vector
elated
a numeric vector
energetic
a numeric vector
enthusiastic
a numeric vector
excited
a numeric vector
fearful
a numeric vector
frustrated
a numeric vector
full.of.pep
a numeric vector
gloomy
a numeric vector
grouchy
a numeric vector
guilty
a numeric vector
happy
a numeric vector
hostile
a numeric vector
idle
a numeric vector
inactive
a numeric vector
inspired
a numeric vector
intense
a numeric vector
interested
a numeric vector
irritable
a numeric vector
jittery
a numeric vector
lively
a numeric vector
lonely
a numeric vector
nervous
a numeric vector
placid
a numeric vector
pleased
a numeric vector
proud
a numeric vector
quiescent
a numeric vector
quiet
a numeric vector
relaxed
a numeric vector
sad
a numeric vector
satisfied
a numeric vector
scared
a numeric vector
serene
a numeric vector
sleepy
a numeric vector
sluggish
a numeric vector
sociable
a numeric vector
sorry
a numeric vector
still
a numeric vector
strong
a numeric vector
surprised
a numeric vector
tense
a numeric vector
tired
a numeric vector
tranquil
a numeric vector
unhappy
a numeric vector
upset
a numeric vector
vigorous
a numeric vector
wakeful
a numeric vector
warmhearted
a numeric vector
wide.awake
a numeric vector
alone
a numeric vector
kindly
a numeric vector
scornful
a numeric vector
EA
Thayer's Energetic Arousal Scale
TA
Thayer's Tense Arousal Scale
PA
Positive Affect scale
NegAff
Negative Affect scale
Extraversion
Extraversion from the Eysenck Personality Inventory
Neuroticism
Neuroticism from the Eysenck Personality Inventory
Lie
Lie from the EPI
Sociability
The sociability subset of the Extraversion Scale
Impulsivity
The impulsivity subset of the Extraversions Scale
MSQ_Time
Time of day the data were collected
MSQ_Round
Rounded time of day
TOD
a numeric vector
TOD24
a numeric vector
ID
subject ID
condition
What was the experimental condition after the msq was given
scale
a factor with levels msq
r
original or revised msq
exper
Which study were the data collected: a factor with levels
AGES
BING
BORN
CART
CITY
COPE
EMIT
FAST
Fern
FILM
FLAT
Gray
imps
item
knob
MAPS
mite
pat-1
pat-2
PATS
post
RAFT
Rim.1
Rim.2
rob-1
rob-2
ROG1
ROG2
SALT
sam-1
sam-2
SAVE/PATS
sett
swam
swam-2
TIME
VALE-1
VALE-2
VIEW
The Motivational States Questionnaire (MSQ) is composed of 72 items, which represent the full affective space (Revelle & Anderson, 1998). The MSQ consists of 20 items taken from the Activation-Deactivation Adjective Check List (Thayer, 1986), 18 from the Positive and Negative Affect Schedule (PANAS, Watson, Clark, & Tellegen, 1988) along with the items used by Larsen and Diener (1992). The response format was a four-point scale that corresponds to Russell and Carroll's (1999) "ambiguous–likely-unipolar format" and that asks the respondents to indicate their current standing (“at this moment") with the following rating scale:
0—————-1—————-2—————-3
Not at all A little Moderately Very much
The original version of the MSQ included 70 items. Intermediate analyses (done with 1840 subjects) demonstrated a concentration of items in some sections of the two dimensional space, and a paucity of items in others. To begin correcting this, 3 items from redundantly measured sections (alone, kindly, scornful) were removed, and 5 new ones (anxious, cheerful, idle, inactive, and tranquil) were added. Thus, the correlation matrix is missing the correlations between items anxious, cheerful, idle, inactive, and tranquil with alone, kindly, and scornful.
Procedure. The data were collected over nine years, as part of a series of studies examining the effects of personality and situational factors on motivational state and subsequent cognitive performance. In each of 38 studies, prior to any manipulation of motivational state, participants signed a consent form and filled out the MSQ. (The procedures of the individual studies are irrelevant to this data set and could not affect the responses to the MSQ, since this instrument was completed before any further instructions or tasks). Some MSQ post test (after manipulations) is available in affect
.
The EA and TA scales are from Thayer, the PA and NA scales are from Watson et al. (1988). Scales and items:
Energetic Arousal: active, energetic, vigorous, wakeful, wide.awake, full.of.pep, lively, -sleepy, -tired, - drowsy (ADACL)
Tense Arousal: Intense, Jittery, fearful, tense, clutched up, -quiet, -still, - placid, - calm, -at rest (ADACL)
Positive Affect: active, alert, attentive, determined, enthusiastic, excited, inspired, interested, proud, strong (PANAS)
Negative Affect: afraid, ashamed, distressed, guilty, hostile, irritable , jittery, nervous, scared, upset (PANAS)
The PA and NA scales can in turn can be thought of as having subscales: (See the PANAS-X) Fear: afraid, scared, nervous, jittery (not included frightened, shaky) Hostility: angry, hostile, irritable, (not included: scornful, disgusted, loathing guilt: ashamed, guilty, (not included: blameworthy, angry at self, disgusted with self, dissatisfied with self) sadness: alone, blue, lonely, sad, (not included: downhearted) joviality: cheerful, delighted, energetic, enthusiastic, excited, happy, lively, (not included: joyful) self-assurance: proud, strong, confident, (not included: bold, daring, fearless ) attentiveness: alert, attentive, determined (not included: concentrating)
The next set of circumplex scales were taken (I think) from Larsen and Diener (1992). High activation: active, aroused, surprised, intense, astonished Activated PA: elated, excited, enthusiastic, lively Unactivated NA : calm, serene, relaxed, at rest, content, at ease PA: happy, warmhearted, pleased, cheerful, delighted Low Activation: quiet, inactive, idle, still, tranquil Unactivated PA: dull, bored, sluggish, tired, drowsy NA: sad, blue, unhappy, gloomy, grouchy Activated NA: jittery, anxious, nervous, fearful, distressed.
Keys for these separate scales are shown in the examples.
In addition to the MSQ, there are 5 scales from the Eysenck Personality Inventory (Extraversion, Impulsivity, Sociability, Neuroticism, Lie). The Imp and Soc are subsets of the the total extraversion scale.
Data collected at the Personality, Motivation, and Cognition Laboratory, Northwestern University.
Larsen, R. J., & Diener, E. (1992). Promises and problems with the circumplex model of emotion. In M. S. Clark (Ed.), Review of personality and social psychology, No. 13. Emotion (pp. 25-59). Thousand Oaks, CA, US: Sage Publications, Inc.
Rafaeli, Eshkol and Revelle, William (2006), A premature consensus: Are happiness and sadness truly opposite affects? Motivation and Emotion, 30, 1, 1-12.
Revelle, W. and Anderson, K.J. (1998) Personality, motivation and cognitive performance: Final report to the Army Research Institute on contract MDA 903-93-K-0008. (https://www.personality-project.org/revelle/publications/ra.ari.98.pdf).
Thayer, R.E. (1989) The biopsychology of mood and arousal. Oxford University Press. New York, NY.
Watson,D., Clark, L.A. and Tellegen, A. (1988) Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology, 54(6):1063-1070.
msqR
for a larger data set with repeated measures for 3032 participants measured at least once, 2753 measured twice, 446 three times and 181 four times. affect
for an example of the use of some of these adjectives in a mood manipulation study.
make.keys
, scoreItems
and scoreOverlap
for instructions on how to score multiple scales with and without item overlap. Also see fa
and fa.extension
for instructions on how to do factor analyses or factor extension.
data(msq) #in in the interests of time #basic descriptive statistics psych::describe(msq) #score them for 20 short scales -- note that these have item overlap #The first 2 are from Thayer #The next 2 are classic positive and negative affect #The next 9 are circumplex scales #the last 7 are msq estimates of PANASX scales (missing some items) keys.list <- list( EA = c("active", "energetic", "vigorous", "wakeful", "wide.awake", "full.of.pep", "lively", "-sleepy", "-tired", "-drowsy"), TA =c("intense", "jittery", "fearful", "tense", "clutched.up", "-quiet", "-still", "-placid", "-calm", "-at.rest") , PA =c("active", "excited", "strong", "inspired", "determined", "attentive", "interested", "enthusiastic", "proud", "alert"), NAf =c("jittery", "nervous", "scared", "afraid", "guilty", "ashamed", "distressed", "upset", "hostile", "irritable" ), HAct = c("active", "aroused", "surprised", "intense", "astonished"), aPA = c("elated", "excited", "enthusiastic", "lively"), uNA = c("calm", "serene", "relaxed", "at.rest", "content", "at.ease"), pa = c("happy", "warmhearted", "pleased", "cheerful", "delighted" ), LAct = c("quiet", "inactive", "idle", "still", "tranquil"), uPA =c( "dull", "bored", "sluggish", "tired", "drowsy"), naf = c( "sad", "blue", "unhappy", "gloomy", "grouchy"), aNA = c("jittery", "anxious", "nervous", "fearful", "distressed"), Fear = c("afraid" , "scared" , "nervous" , "jittery" ) , Hostility = c("angry" , "hostile", "irritable", "scornful" ), Guilt = c("guilty" , "ashamed" ), Sadness = c( "sad" , "blue" , "lonely", "alone" ), Joviality =c("happy","delighted", "cheerful", "excited", "enthusiastic", "lively", "energetic"), Self.Assurance=c( "proud","strong" , "confident" , "-fearful" ), Attentiveness = c("alert" , "determined" , "attentive" ) #, acquiscence = c("sleepy" , "wakeful" , "relaxed","tense") #dropped because it has a negative alpha and throws warnings ) msq.scores <- psych::scoreItems(keys.list,msq) #show a circumplex structure for the non-overlapping items fcirc <- psych::fa(msq.scores$scores[,5:12],2) psych::fa.plot(fcirc,labels=colnames(msq.scores$scores)[5:12]) #now, find the correlations corrected for item overlap msq.overlap <- psych::scoreOverlap(keys.list,msq) #a warning is thrown by smc because of some NAs in the matrix f2 <- psych::fa(msq.overlap$cor,2) psych::fa.plot(f2,labels=colnames(msq.overlap$cor), title="2 dimensions of affect, corrected for overlap") #extend this solution to EA/TA NA/PA space fe <- psych::fa.extension(cor(msq.scores$scores[,5:12],msq.scores$scores[,1:4]),fcirc) psych::fa.diagram(fcirc,fe=fe, main="Extending the circumplex structure to EA/TA and PA/NA ") #show the 2 dimensional structure f2 <- psych::fa(msq[1:72],2) psych::fa.plot(f2,labels=colnames(msq)[1:72], title="2 dimensions of affect at the item level",cex=.5) #sort them by polar coordinates round(psych::polar(f2),2)
data(msq) #in in the interests of time #basic descriptive statistics psych::describe(msq) #score them for 20 short scales -- note that these have item overlap #The first 2 are from Thayer #The next 2 are classic positive and negative affect #The next 9 are circumplex scales #the last 7 are msq estimates of PANASX scales (missing some items) keys.list <- list( EA = c("active", "energetic", "vigorous", "wakeful", "wide.awake", "full.of.pep", "lively", "-sleepy", "-tired", "-drowsy"), TA =c("intense", "jittery", "fearful", "tense", "clutched.up", "-quiet", "-still", "-placid", "-calm", "-at.rest") , PA =c("active", "excited", "strong", "inspired", "determined", "attentive", "interested", "enthusiastic", "proud", "alert"), NAf =c("jittery", "nervous", "scared", "afraid", "guilty", "ashamed", "distressed", "upset", "hostile", "irritable" ), HAct = c("active", "aroused", "surprised", "intense", "astonished"), aPA = c("elated", "excited", "enthusiastic", "lively"), uNA = c("calm", "serene", "relaxed", "at.rest", "content", "at.ease"), pa = c("happy", "warmhearted", "pleased", "cheerful", "delighted" ), LAct = c("quiet", "inactive", "idle", "still", "tranquil"), uPA =c( "dull", "bored", "sluggish", "tired", "drowsy"), naf = c( "sad", "blue", "unhappy", "gloomy", "grouchy"), aNA = c("jittery", "anxious", "nervous", "fearful", "distressed"), Fear = c("afraid" , "scared" , "nervous" , "jittery" ) , Hostility = c("angry" , "hostile", "irritable", "scornful" ), Guilt = c("guilty" , "ashamed" ), Sadness = c( "sad" , "blue" , "lonely", "alone" ), Joviality =c("happy","delighted", "cheerful", "excited", "enthusiastic", "lively", "energetic"), Self.Assurance=c( "proud","strong" , "confident" , "-fearful" ), Attentiveness = c("alert" , "determined" , "attentive" ) #, acquiscence = c("sleepy" , "wakeful" , "relaxed","tense") #dropped because it has a negative alpha and throws warnings ) msq.scores <- psych::scoreItems(keys.list,msq) #show a circumplex structure for the non-overlapping items fcirc <- psych::fa(msq.scores$scores[,5:12],2) psych::fa.plot(fcirc,labels=colnames(msq.scores$scores)[5:12]) #now, find the correlations corrected for item overlap msq.overlap <- psych::scoreOverlap(keys.list,msq) #a warning is thrown by smc because of some NAs in the matrix f2 <- psych::fa(msq.overlap$cor,2) psych::fa.plot(f2,labels=colnames(msq.overlap$cor), title="2 dimensions of affect, corrected for overlap") #extend this solution to EA/TA NA/PA space fe <- psych::fa.extension(cor(msq.scores$scores[,5:12],msq.scores$scores[,1:4]),fcirc) psych::fa.diagram(fcirc,fe=fe, main="Extending the circumplex structure to EA/TA and PA/NA ") #show the 2 dimensional structure f2 <- psych::fa(msq[1:72],2) psych::fa.plot(f2,labels=colnames(msq)[1:72], title="2 dimensions of affect at the item level",cex=.5) #sort them by polar coordinates round(psych::polar(f2),2)
Emotions may be described either as discrete emotions or in dimensional terms. The Motivational State Questionnaire (MSQ) was developed to study emotions in laboratory and field settings. The data can be well described in terms of a two dimensional solution of energy vs tiredness and tension versus calmness. Alternatively, this space can be organized by the two dimensions of Positive Affect and Negative Affect. Additional items include what time of day the data were collected and a few personality questionnaire scores. 3032 unique participants took the MSQ at least once, 2753 at least twice, 446 three times, and 181 four times. The 3032 participants also took the sai
state anxiety inventory at the same time. Some studies manipulated arousal by caffeine, others manipulations included affect inducing movies.
data("msqR")
data("msqR")
A data frame with 6411 observations on the following 88 variables.
active
a numeric vector
afraid
a numeric vector
alert
a numeric vector
alone
a numeric vector
angry
a numeric vector
aroused
a numeric vector
ashamed
a numeric vector
astonished
a numeric vector
at.ease
a numeric vector
at.rest
a numeric vector
attentive
a numeric vector
blue
a numeric vector
bored
a numeric vector
calm
a numeric vector
clutched.up
a numeric vector
confident
a numeric vector
content
a numeric vector
delighted
a numeric vector
depressed
a numeric vector
determined
a numeric vector
distressed
a numeric vector
drowsy
a numeric vector
dull
a numeric vector
elated
a numeric vector
energetic
a numeric vector
enthusiastic
a numeric vector
excited
a numeric vector
fearful
a numeric vector
frustrated
a numeric vector
full.of.pep
a numeric vector
gloomy
a numeric vector
grouchy
a numeric vector
guilty
a numeric vector
happy
a numeric vector
hostile
a numeric vector
inspired
a numeric vector
intense
a numeric vector
interested
a numeric vector
irritable
a numeric vector
jittery
a numeric vector
lively
a numeric vector
lonely
a numeric vector
nervous
a numeric vector
placid
a numeric vector
pleased
a numeric vector
proud
a numeric vector
quiescent
a numeric vector
quiet
a numeric vector
relaxed
a numeric vector
sad
a numeric vector
satisfied
a numeric vector
scared
a numeric vector
serene
a numeric vector
sleepy
a numeric vector
sluggish
a numeric vector
sociable
a numeric vector
sorry
a numeric vector
still
a numeric vector
strong
a numeric vector
surprised
a numeric vector
tense
a numeric vector
tired
a numeric vector
unhappy
a numeric vector
upset
a numeric vector
vigorous
a numeric vector
wakeful
a numeric vector
warmhearted
a numeric vector
wide.awake
a numeric vector
anxious
a numeric vector
cheerful
a numeric vector
idle
a numeric vector
inactive
a numeric vector
tranquil
a numeric vector
kindly
a numeric vector
scornful
a numeric vector
Extraversion
Extraversion from the EPI
Neuroticism
Neuroticism from the EPI
Lie
Lie from the EPI
Sociability
Sociability from the EPI
Impulsivity
Impulsivity from the EPI
gender
1= male, 2 = female (coded on presumed x chromosome). Slowly being added to the data set.
TOD
Time of day that the study was run
drug
1 if given placebo, 2 if given caffeine
film
1-4 if given a film: 1=Frontline, 2= Halloween, 3=Serengeti, 4 = Parenthood
time
Measurement occasion (1 and 2 are same session, 3 and 4 are the same, but a later session)
id
a numeric vector
form
msq versus msqR
study
a character vector of the experiment name
The Motivational States Questionnaire (MSQ) is composed of 75 items, which represent the full affective space (Revelle & Anderson, 1998). The MSQ consists of 20 items taken from the Activation-Deactivation Adjective Check List (Thayer, 1986), 18 from the Positive and Negative Affect Schedule (PANAS, Watson, Clark, & Tellegen, 1988) along with the affective circumplex items used by Larsen and Diener (1992). The response format was a four-point scale that corresponds to Russell and Carroll's (1999) "ambiguous–likely-unipolar format" and that asks the respondents to indicate their current standing (“at this moment") with the following rating scale:
0—————-1—————-2—————-3
Not at all A little Moderately Very much
The original version of the MSQ included 70 items. Intermediate analyses (done with 1840 subjects) demonstrated a concentration of items in some sections of the two dimensional space, and a paucity of items in others. To begin correcting this, 3 items from redundantly measured sections (alone, kindly, scornful) were removed, and 5 new ones (anxious, cheerful, idle, inactive, and tranquil) were added. Thus, the correlation matrix is missing the correlations between items anxious, cheerful, idle, inactive, and tranquil with alone, kindly, and scornful.
2605 individuals took Form 1 version, 3806 the Form 2 version. 3032 people (1218 form 1, 1814 form 2) took the MSQ at least once. 2086 at least twice, 1112 three times, and 181 four times.
To see the relative frequencies by time and form, see the first example.
Procedure. The data were collected over nine years in the Personality, Motivation and Cognition laboratory at Northwestern, as part of a series of studies examining the effects of personality and situational factors on motivational state and subsequent cognitive performance. In each of 38 studies, prior to any manipulation of motivational state, participants signed a consent form and in some studies, consumed 0 or 4mg/kg of caffeine. In caffeine studies, they waited 30 minutes and then filled out the MSQ. (Normally, the procedures of the individual studies are irrelevant to this data set and could not affect the responses to the MSQ at time 1, since this instrument was completed before any further instructions or tasks. However, caffeine does have an effect.) The MSQ post test following a movie manipulation) is available in affect
as well as here.
The XRAY study crossed four movie conditions with caffeine. The first MSQ measures are showing the effects of the movies and caffeine, but after an additional 30 minutes, the second MSQ seems to mainly show the caffeine effects. The movies were 9 minute clips from 1) a BBC documentary on British troops arriving at the Bergen-Belsen concentration camp (sad); 2) an early scene from Halloween in which the heroine runs around shutting doors and windows (terror); 3) a documentary about lions on the Serengeti plain, and 4) the "birthday party" scene from Parenthood.
The FLAT study measured affect before, immediately after, and then after 30 minutes following a movie manipulation. See the affect
data set.
To see which studies used which conditions, see the second and third examples.
The EA and TA scales are from Thayer, the PA and NA scales are from Watson et al. (1988). Scales and items:
Energetic Arousal: active, energetic, vigorous, wakeful, wide.awake, full.of.pep, lively, -sleepy, -tired, - drowsy (ADACL)
Tense Arousal: Intense, Jittery, fearful, tense, clutched up, -quiet, -still, - placid, - calm, -at rest (ADACL)
Positive Affect: active, alert, attentive, determined, enthusiastic, excited, inspired, interested, proud, strong (PANAS)
Negative Affect: afraid, ashamed, distressed, guilty, hostile, irritable , jittery, nervous, scared, upset (PANAS)
The PA and NA scales can in turn can be thought of as having subscales: (See the PANAS-X) Fear: afraid, scared, nervous, jittery (not included frightened, shaky) Hostility: angry, hostile, irritable, (not included: scornful, disgusted, loathing guilt: ashamed, guilty, (not included: blameworthy, angry at self, disgusted with self, dissatisfied with self) sadness: alone, blue, lonely, sad, (not included: downhearted) joviality: cheerful, delighted, energetic, enthusiastic, excited, happy, lively, (not included: joyful) self-assurance: proud, strong, confident, (not included: bold, daring, fearless ) attentiveness: alert, attentive, determined (not included: concentrating)
The next set of circumplex scales were taken from Larsen and Diener (1992). High activation: active, aroused, surprised, intense, astonished Activated PA: elated, excited, enthusiastic, lively Unactivated NA : calm, serene, relaxed, at rest, content, at ease PA: happy, warmhearted, pleased, cheerful, delighted Low Activation: quiet, inactive, idle, still, tranquil Unactivated PA: dull, bored, sluggish, tired, drowsy NA: sad, blue, unhappy, gloomy, grouchy Activated NA: jittery, anxious, nervous, fearful, distressed.
Keys for these separate scales are shown in the examples.
In addition to the MSQ, there are 5 scales from the Eysenck Personality Inventory (Extraversion, Impulsivity, Sociability, Neuroticism, Lie). The Imp and Soc are subsets of the the total extraversion scale based upon a reanalysis of the EPI by Rocklin and Revelle (1983). This information is in the msq
data set as well.
In December, 2018 the caffeine, film and personality conditions were added. In the process of doing so, it was discovered that the EMIT data had been incorrectly entered. This has been fixed.
Data collected at the Personality, Motivation, and Cognition Laboratory, Northwestern University.
Larsen, R. J., & Diener, E. (1992). Promises and problems with the circumplex model of emotion. In M. S. Clark (Ed.), Review of personality and social psychology, No. 13. Emotion (pp. 25-59). Thousand Oaks, CA, US: Sage Publications, Inc.
Rafaeli, Eshkol and Revelle, William (2006), A premature consensus: Are happiness and sadness truly opposite affects? Motivation and Emotion, 30, 1, 1-12.
Revelle, W. and Anderson, K.J. (1998) Personality, motivation and cognitive performance: Final report to the Army Research Institute on contract MDA 903-93-K-0008. (https://www.personality-project.org/revelle/publications/ra.ari.98.pdf).
Smillie, Luke D. and Cooper, Andrew and Wilt, Joshua and Revelle, William (2012) Do Extraverts Get More Bang for the Buck? Refining the Affective-Reactivity Hypothesis of Extraversion. Journal of Personality and Social Psychology, 103 (2), 206-326.
Thayer, R.E. (1989) The biopsychology of mood and arousal. Oxford University Press. New York, NY.
Watson,D., Clark, L.A. and Tellegen, A. (1988) Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology, 54(6):1063-1070.
msq
for 3896 participants with scores on five scales of the EPI. affect
for an example of the use of some of these adjectives in a mood manipulation study.
make.keys
, scoreItems
and scoreOverlap
for instructions on how to score multiple scales with and without item overlap. Also see fa
and fa.extension
for instructions on how to do factor analyses or factor extension.
Given the temporal ordering of the sai
data and the msqR
data, these data are useful for demonstrations of testRetest
reliability. See the examples in testRetest
for how to combine the sai
tai
and msqR
datasets.
data(msqR) table(msqR$form,msqR$time) #which forms? table(msqR$study,msqR$drug) #Drug studies table(msqR$study,msqR$film) #Film studies table(msqR$study,msqR$TOD) #To examine time of day #score them for 20 short scales -- note that these have item overlap #The first 2 are from Thayer #The next 2 are classic positive and negative affect #The next 9 are circumplex scales #the last 7 are msq estimates of PANASX scales (missing some items) keys.list <- list( EA = c("active", "energetic", "vigorous", "wakeful", "wide.awake", "full.of.pep", "lively", "-sleepy", "-tired", "-drowsy"), TA =c("intense", "jittery", "fearful", "tense", "clutched.up", "-quiet", "-still", "-placid", "-calm", "-at.rest") , PA =c("active", "excited", "strong", "inspired", "determined", "attentive", "interested", "enthusiastic", "proud", "alert"), NAf =c("jittery", "nervous", "scared", "afraid", "guilty", "ashamed", "distressed", "upset", "hostile", "irritable" ), HAct = c("active", "aroused", "surprised", "intense", "astonished"), aPA = c("elated", "excited", "enthusiastic", "lively"), uNA = c("calm", "serene", "relaxed", "at.rest", "content", "at.ease"), pa = c("happy", "warmhearted", "pleased", "cheerful", "delighted" ), LAct = c("quiet", "inactive", "idle", "still", "tranquil"), uPA =c( "dull", "bored", "sluggish", "tired", "drowsy"), naf = c( "sad", "blue", "unhappy", "gloomy", "grouchy"), aNA = c("jittery", "anxious", "nervous", "fearful", "distressed"), Fear = c("afraid" , "scared" , "nervous" , "jittery" ) , Hostility = c("angry" , "hostile", "irritable", "scornful" ), Guilt = c("guilty" , "ashamed" ), Sadness = c( "sad" , "blue" , "lonely", "alone" ), Joviality =c("happy","delighted", "cheerful", "excited", "enthusiastic", "lively", "energetic"), Self.Assurance=c( "proud","strong" , "confident" , "-fearful" ), Attentiveness = c("alert" , "determined" , "attentive" )) #acquiscence = c("sleepy" , "wakeful" , "relaxed","tense")) #Yik Russell and Steiger list the following items Yik.keys <- list( pleasure =psych::cs(happy,content,satisfied, pleased), act.pleasure =psych::cs(proud,enthusiastic,euphoric), pleasant.activation = psych::cs(energetic,full.of.pep,excited,wakeful,attentive, wide.awake,active,alert,vigorous), activation = psych::cs(aroused,hyperactivated,intense), unpleasant.act = psych::cs(anxious,frenzied,jittery,nervous), activated.displeasure =psych::cs(scared,upset,shaky,fearful,clutched.up,tense, ashamed,guilty,agitated,hostile), displeaure =psych::cs(troubled,miserable,unhappy,dissatisfied), Ueactivated.Displeasure = psych::cs(sad,down,gloomy,blue,melancholy), Unpleasant.Deactivation = psych::cs(droopy,drowsy,dull,bored,sluggish,tired), Deactivation =psych::cs( quiet,still), pleasant.deactivation = psych::cs(placid,relaxed,tranquil, at.rest,calm), deactived.pleasure =psych::cs( serene,soothed,peaceful,at.ease,secure) ) #of these 60 items, 46 appear in the msqR Yik.msq.keys <- list( Pleasure =psych::cs(happy,content,satisfied, pleased), Activated.Pleasure =psych::cs(proud,enthusiastic), Pleasant.Activation = psych::cs(energetic,full.of.pep,excited,wakeful,attentive, wide.awake,active,alert,vigorous), Activation = psych::cs(aroused,intense), Unpleasant.Activation = psych::cs(anxious,jittery,nervous), Activated.Displeasure =psych::cs(scared,upset,fearful, clutched.up,tense,ashamed,guilty,hostile), Displeasure = psych::cs(unhappy), Deactivated.Displeasure = psych::cs(sad,gloomy,blue), Unpleasant.Deactivation = psych::cs(drowsy,dull,bored,sluggish,tired), Deactivation =psych::cs( quiet,still), Pleasant.Deactivation = psych::cs(placid,relaxed,tranquil, at.rest,calm), Deactivated.Pleasure =psych::cs( serene,at.ease) ) yik.scores <- psych::scoreItems(Yik.msq.keys,msqR) yik <- yik.scores$scores f2.yik <- psych::fa(yik,2) #factor the yik scores psych::fa.plot(f2.yik,labels=colnames(yik),title="Yik-Russell-Steiger circumplex",cex=.8, pos=(c(1,1,2,1,1,1,3,1,4,1,2,4))) msq.scores <- psych::scoreItems(keys.list,msqR) #show a circumplex structure for the non-overlapping items fcirc <- psych::fa(msq.scores$scores[,5:12],2) psych::fa.plot(fcirc,labels=colnames(msq.scores$scores)[5:12]) #now, find the correlations corrected for item overlap msq.overlap <- psych::scoreOverlap(keys.list,msqR) f2 <- psych::fa(msq.overlap$cor,2) psych::fa.plot(f2,labels=colnames(msq.overlap$cor), title="2 dimensions of affect, corrected for overlap") #extend this solution to EA/TA NA/PA space fe <- psych::fa.extension(cor(msq.scores$scores[,5:12],msq.scores$scores[,1:4]),fcirc) psych::fa.diagram(fcirc,fe=fe,main="Extending the circumplex structure to EA/TA and PA/NA ") #show the 2 dimensional structure f2 <- psych::fa(msqR[1:72],2) psych::fa.plot(f2,labels=colnames(msqR)[1:72],title="2 dimensions of affect at the item level") #sort them by polar coordinates round(psych::polar(f2),2) #the msqR and sai data sets have 10 overlapping items which can be used for #testRetest analysis. We need to specify the keys, and then choose the appropriate #data sets sai.msq.keys <- list(pos =c( "at.ease" , "calm" , "confident", "content","relaxed"), neg = c("anxious", "jittery", "nervous" ,"tense" , "upset"), anx = c("anxious", "jittery", "nervous" ,"tense", "upset","-at.ease" , "-calm" , "-confident", "-content","-relaxed")) select <- psych::selectFromKeys(sai.msq.keys$anx) #The following is useful for examining test retest reliabilities msq.control <- subset(msqR,is.element( msqR$study , c("Cart", "Fast", "SHED", "SHOP"))) msq.film <- subset(msqR,(is.element( msqR$study , c("FIAT", "FILM","FLAT","MIXX","XRAY")) & (msqR$time < 3) )) msq.film[((msq.film$study == "FLAT") & (msq.film$time ==3)) ,] <- NA msq.drug <- subset(msqR,(is.element( msqR$study , c("AGES","SALT", "VALE", "XRAY"))) &(msqR$time < 3)) msq.day <- subset(msqR,is.element( msqR$study , c("SAM", "RIM")))
data(msqR) table(msqR$form,msqR$time) #which forms? table(msqR$study,msqR$drug) #Drug studies table(msqR$study,msqR$film) #Film studies table(msqR$study,msqR$TOD) #To examine time of day #score them for 20 short scales -- note that these have item overlap #The first 2 are from Thayer #The next 2 are classic positive and negative affect #The next 9 are circumplex scales #the last 7 are msq estimates of PANASX scales (missing some items) keys.list <- list( EA = c("active", "energetic", "vigorous", "wakeful", "wide.awake", "full.of.pep", "lively", "-sleepy", "-tired", "-drowsy"), TA =c("intense", "jittery", "fearful", "tense", "clutched.up", "-quiet", "-still", "-placid", "-calm", "-at.rest") , PA =c("active", "excited", "strong", "inspired", "determined", "attentive", "interested", "enthusiastic", "proud", "alert"), NAf =c("jittery", "nervous", "scared", "afraid", "guilty", "ashamed", "distressed", "upset", "hostile", "irritable" ), HAct = c("active", "aroused", "surprised", "intense", "astonished"), aPA = c("elated", "excited", "enthusiastic", "lively"), uNA = c("calm", "serene", "relaxed", "at.rest", "content", "at.ease"), pa = c("happy", "warmhearted", "pleased", "cheerful", "delighted" ), LAct = c("quiet", "inactive", "idle", "still", "tranquil"), uPA =c( "dull", "bored", "sluggish", "tired", "drowsy"), naf = c( "sad", "blue", "unhappy", "gloomy", "grouchy"), aNA = c("jittery", "anxious", "nervous", "fearful", "distressed"), Fear = c("afraid" , "scared" , "nervous" , "jittery" ) , Hostility = c("angry" , "hostile", "irritable", "scornful" ), Guilt = c("guilty" , "ashamed" ), Sadness = c( "sad" , "blue" , "lonely", "alone" ), Joviality =c("happy","delighted", "cheerful", "excited", "enthusiastic", "lively", "energetic"), Self.Assurance=c( "proud","strong" , "confident" , "-fearful" ), Attentiveness = c("alert" , "determined" , "attentive" )) #acquiscence = c("sleepy" , "wakeful" , "relaxed","tense")) #Yik Russell and Steiger list the following items Yik.keys <- list( pleasure =psych::cs(happy,content,satisfied, pleased), act.pleasure =psych::cs(proud,enthusiastic,euphoric), pleasant.activation = psych::cs(energetic,full.of.pep,excited,wakeful,attentive, wide.awake,active,alert,vigorous), activation = psych::cs(aroused,hyperactivated,intense), unpleasant.act = psych::cs(anxious,frenzied,jittery,nervous), activated.displeasure =psych::cs(scared,upset,shaky,fearful,clutched.up,tense, ashamed,guilty,agitated,hostile), displeaure =psych::cs(troubled,miserable,unhappy,dissatisfied), Ueactivated.Displeasure = psych::cs(sad,down,gloomy,blue,melancholy), Unpleasant.Deactivation = psych::cs(droopy,drowsy,dull,bored,sluggish,tired), Deactivation =psych::cs( quiet,still), pleasant.deactivation = psych::cs(placid,relaxed,tranquil, at.rest,calm), deactived.pleasure =psych::cs( serene,soothed,peaceful,at.ease,secure) ) #of these 60 items, 46 appear in the msqR Yik.msq.keys <- list( Pleasure =psych::cs(happy,content,satisfied, pleased), Activated.Pleasure =psych::cs(proud,enthusiastic), Pleasant.Activation = psych::cs(energetic,full.of.pep,excited,wakeful,attentive, wide.awake,active,alert,vigorous), Activation = psych::cs(aroused,intense), Unpleasant.Activation = psych::cs(anxious,jittery,nervous), Activated.Displeasure =psych::cs(scared,upset,fearful, clutched.up,tense,ashamed,guilty,hostile), Displeasure = psych::cs(unhappy), Deactivated.Displeasure = psych::cs(sad,gloomy,blue), Unpleasant.Deactivation = psych::cs(drowsy,dull,bored,sluggish,tired), Deactivation =psych::cs( quiet,still), Pleasant.Deactivation = psych::cs(placid,relaxed,tranquil, at.rest,calm), Deactivated.Pleasure =psych::cs( serene,at.ease) ) yik.scores <- psych::scoreItems(Yik.msq.keys,msqR) yik <- yik.scores$scores f2.yik <- psych::fa(yik,2) #factor the yik scores psych::fa.plot(f2.yik,labels=colnames(yik),title="Yik-Russell-Steiger circumplex",cex=.8, pos=(c(1,1,2,1,1,1,3,1,4,1,2,4))) msq.scores <- psych::scoreItems(keys.list,msqR) #show a circumplex structure for the non-overlapping items fcirc <- psych::fa(msq.scores$scores[,5:12],2) psych::fa.plot(fcirc,labels=colnames(msq.scores$scores)[5:12]) #now, find the correlations corrected for item overlap msq.overlap <- psych::scoreOverlap(keys.list,msqR) f2 <- psych::fa(msq.overlap$cor,2) psych::fa.plot(f2,labels=colnames(msq.overlap$cor), title="2 dimensions of affect, corrected for overlap") #extend this solution to EA/TA NA/PA space fe <- psych::fa.extension(cor(msq.scores$scores[,5:12],msq.scores$scores[,1:4]),fcirc) psych::fa.diagram(fcirc,fe=fe,main="Extending the circumplex structure to EA/TA and PA/NA ") #show the 2 dimensional structure f2 <- psych::fa(msqR[1:72],2) psych::fa.plot(f2,labels=colnames(msqR)[1:72],title="2 dimensions of affect at the item level") #sort them by polar coordinates round(psych::polar(f2),2) #the msqR and sai data sets have 10 overlapping items which can be used for #testRetest analysis. We need to specify the keys, and then choose the appropriate #data sets sai.msq.keys <- list(pos =c( "at.ease" , "calm" , "confident", "content","relaxed"), neg = c("anxious", "jittery", "nervous" ,"tense" , "upset"), anx = c("anxious", "jittery", "nervous" ,"tense", "upset","-at.ease" , "-calm" , "-confident", "-content","-relaxed")) select <- psych::selectFromKeys(sai.msq.keys$anx) #The following is useful for examining test retest reliabilities msq.control <- subset(msqR,is.element( msqR$study , c("Cart", "Fast", "SHED", "SHOP"))) msq.film <- subset(msqR,(is.element( msqR$study , c("FIAT", "FILM","FLAT","MIXX","XRAY")) & (msqR$time < 3) )) msq.film[((msq.film$study == "FLAT") & (msq.film$time ==3)) ,] <- NA msq.drug <- subset(msqR,(is.element( msqR$study , c("AGES","SALT", "VALE", "XRAY"))) &(msqR$time < 3)) msq.day <- subset(msqR,is.element( msqR$study , c("SAM", "RIM")))
The NEO.PI.R is a widely used personality test to assess 5 broad factors (Neuroticism, Extraversion, Openness, Agreeableness and Conscientiousness) with six facet scales for each factor. The correlation matrix of the facets is reported in the NEO.PI.R manual for 1000 subjects.
data(neo)
data(neo)
A data frame of a 30 x 30 correlation matrix with the following 30 variables.
Anxiety
AngryHostility
Depression
Self-Consciousness
Impulsiveness
Vulnerability
Warmth
Gregariousness
Assertiveness
Activity
Excitement-Seeking
PositiveEmotions
Fantasy
Aesthetics
Feelings
Ideas
Actions
Values
Trust
Straightforwardness
Altruism
Compliance
Modesty
Tender-Mindedness
Competence
Order
Dutifulness
AchievementStriving
Self-Discipline
Deliberation
The past thirty years of personality research has led to a general consensus on the identification of major dimensions of personality. Variously known as the “Big 5" or the “Five Factor Model", the general solution represents 5 broad domains of personal and interpersonal experience. Neuroticism and Extraversion are thought to reflect sensitivity to negative and positive cues from the environment and the tendency to withdraw or approach. Openness is sometimes labeled as Intellect and reflects an interest in new ideas and experiences. Agreeableness and Conscientiousness reflect tendencies to get along with others and to want to get ahead.
The factor structure of the NEO suggests five correlated factors as well as two higher level factors. The NEO was constructed with 6 “facets" for each of the five broad factors.
For a contrasting structure, examine the items of the link{spi}
data set (Condon, 2017).
Costa, Paul T. and McCrae, Robert R. (1992) (NEO PI-R) professional manual. Psychological Assessment Resources, Inc. Odessa, FL. (with permission of the author and the publisher)
Condon, D. (2017) The SAPA Personality Inventory:An empirically-derived, hierarchically-organized self-report personality assessment model
Digman, John M. (1990) Personality structure: Emergence of the five-factor model. Annual Review of Psychology. 41, 417-440.
John M. Digman (1997) Higher-order factors of the Big Five. Journal of Personality and Social Psychology, 73, 1246-1256.
McCrae, Robert R. and Costa, Paul T., Jr. (1999) A Five-Factor theory of personality. In Pervin, Lawrence A. and John, Oliver P. (eds) Handbook of personality: Theory and research (2nd ed.) 139-153. Guilford Press, New York. N.Y.
Revelle, William (1995), Personality processes, Annual Review of Psychology, 46, 295-328.
Joshua Wilt and William Revelle (2009) Extraversion and Emotional Reactivity. In Mark Leary and Rick H. Hoyle (eds). Handbook of Individual Differences in Social Behavior. Guilford Press, New York, N.Y.
Joshua Wil and William Revelle (2016) Extraversion. In Thomas Widiger (ed) The Oxford Handbook of the Five Factor Model. Oxford University Press.
data(neo) n5 <- psych::fa(neo,5) neo.keys <- psych::make.keys(30,list(N=c(1:6),E=c(7:12),O=c(13:18),A=c(19:24),C=c(25:30))) n5p <- psych::target.rot(n5,neo.keys) #show a targeted rotation for simple structure n5p
data(neo) n5 <- psych::fa(neo,5) neo.keys <- psych::make.keys(30,list(N=c(1:6),E=c(7:12),O=c(13:18),A=c(19:24),C=c(25:30))) n5p <- psych::target.rot(n5,neo.keys) #show a targeted rotation for simple structure n5p
Francis Galton introduced the correlation coefficient with an analysis of the similarities of the parent and child generation of 700 sweet peas.
data(peas)
data(peas)
A data frame with 700 observations on the following 2 variables.
parent
The mean diameter of the mother pea for 700 peas
child
The mean diameter of the daughter pea for 700 sweet peas
Galton's introduction of the correlation coefficient was perhaps the most important contribution to the study of individual differences. This data set allows a graphical analysis of the data set. There are two different graphic examples. One shows the regression lines for both relationships, the other finds the correlation as well.
Stanton, Jeffrey M. (2001) Galton, Pearson, and the Peas: A brief history of linear regression for statistics intstructors, Journal of Statistics Education, 9. (retrieved from the web from https://www.amstat.org/publications/jse/v9n3/stanton.html) reproduces the table from Galton, 1894, Table 2.
The data were generated from this table.
Galton, Francis (1877) Typical laws of heredity. paper presented to the weekly evening meeting of the Royal Institution, London. Volume VIII (66) is the first reference to this data set. The data appear in
Galton, Francis (1894) Natural Inheritance (5th Edition), New York: MacMillan).
The other Galton data sets: heights
, galton
,cubits
data(peas) psych::pairs.panels(peas,lm=TRUE,xlim=c(14,22),ylim=c(14,22),main="Galton's Peas") psych::describe(peas) psych::pairs.panels(peas,main="Galton's Peas")
data(peas) psych::pairs.panels(peas,lm=TRUE,xlim=c(14,22),ylim=c(14,22),main="Galton's Peas") psych::describe(peas) psych::pairs.panels(peas,main="Galton's Peas")
A correlation matrix taken from Pollack (2012) with 9 variables. Primarily used as an example for setCor and mediation.
data("Pollack")
data("Pollack")
A correlation matrix based upon 262 participants.
sex
Male = 1, Female = 0, 62% male
age
mean =33
tenure
length of employent, mean = 5.9 years
self.efficacy
self ratings
competence
self rating of competence
social.ties
Contact with business-related social ties
economic.stress
mean of two items on economic stress
depression
6 items from MAACL measuring depression
withdrawal
Withdrawal intentions in domain of entrepreneurship
This is the correlation matrix from Pollack et al. (2012) p 797. The raw data are available from the processR package (Keon-Woong Moon, 2020). The data set is used by Hayes in example p 179 in example 3.
Pollack et al. 2012
Pollack, Jeffrey M. and Vanepps, Eric M. and Hayes, Andrew F. (2012). The moderating role of social ties on entrepreneurs' depressed affect and withdrawal intentions in response to economic stress, Journal of Organizational Behavior 33 (6) 789-810.
Hayes, Andrew F. (2013) Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford Press.
psych::lowerMat(Pollack)
psych::lowerMat(Pollack)
PsychTools includes the larger data sets used by the psych
package and also includes a few general utility functions such as the read.file
and read.clipboard
functions. The data sets ara made available for demonstrations of a variety of psychometric functions.
See the various helpfiles listed in the index or as links from here. Also see the main functions in the psych package 00.psych-package
.
Data sets from the SAPA/ICAR project:
ability |
16 ICAR ability items scored as correct or incorrect for 1525 participants. |
iqitems |
multiple choice IQ items (raw responses) |
affect |
Two data sets of affect and arousal scores as a function of personality and movie conditions |
bfi |
25 Personality items representing 5 factors from the SAPA project for 2800 participants |
bfi.dictionary | Dictionary of the bfi |
big5.100.adjectives 100 adjectives describing the "big 5" for 502 subjects (from Goldberg)
colom |
Correlations from the Spanish WAIS (14 scales) |
eminence |
Eminence of 69 American Psychologists |
epi |
Eysenck Personality Inventory (EPI) data for 3570 participants |
epi.dictionary | The items for the epi |
epi.bfi |
13 personality scales from the Eysenck Personality Inventory and Big 5 inventory |
epiR |
474 participants took the epi twice |
msq |
75 mood items from the Motivational State Questionnaire for 3896 participants |
msqR |
75 mood items from the Motivational State Questionnaire for 3032 unique participants |
tai |
Trait Anxiety data from the PMC lab matching the sai sample. 3032 unique subjects |
sai |
State Anxiety data from the PMC lab over multiple occasions. 3032 unique subjects. |
sai.dictionary | items used in the sai |
spi |
4000 cases from the SAPA Personality Inventory (135 items, 10 demographics) including an item dictionary and scoring keys. |
spi.dictionary | The items for the spi |
spi.keys | Scoring keys for the spi |
Historically interesting data sets
burt |
11 emotional variables from Burt (1915) |
galton |
Galtons Mid parent child height data |
heights |
A data.frame of the Galton (1888) height and cubit data set |
cubits |
Galtons example of the relationship between height and cubit or forearm length |
peas |
Galtons Peas |
cushny |
The data set from Cushny and Peebles (1905) on the effect of three drugs on hours of sleep, used by Student (1908) |
holzinger.swineford |
26 cognitive variables + 7 demographic variables for 301 cases from Holzinger and Swineford. |
Miscellaneous example data sets
blant |
A 29 x 29 matrix that produces weird factor analytic results |
blot |
Bonds Logical Operations Test - BLOT |
cities |
Distances between 11 US cities |
city.location | and their geograpical location |
income |
US family income from US census 2008 |
all.income | US family income from US census 2008 |
neo |
NEO correlation matrix from the NEO_PI_R manual |
Schutz |
The Schutz correlation matrix example from Shapiro and ten Berge |
Spengler |
The Spengler and Damian correlation matrix example from Spengler, Damian and Roberts (2018) |
Damian |
Another correlation matrix from Spengler, Damian and Roberts (2018) |
usaf |
A correlation of 17 body size (anthropometric) measures from the US Air Force. Adapted from the Anthropometric package. |
veg | Paired comparison of preferences for 9 vegetables (scaling example) |
Functions to convert various objects to latex
fa2latex |
Convert a data frame, correlation matrix, or factor analysis output to a LaTeX table |
df2latex |
Convert a data frame, correlation matrix, or factor analysis output to a LaTeX table |
ICC2latex |
Convert an ICC analyssis output to a LaTeX table |
irt2latex |
Convert an irt analysis output to a LaTeX table |
cor2latex |
Convert a correlation matrix output to a LaTeX table |
omega2latex |
Convert a data frame, correlation matrix, or factor analysis output to a LaTeX table |
File manipulation functions
fileCreate |
Create a file |
fileScan | Show the first few lines of multitple files |
filesInfo | Show the information for all files in a directory |
filesList | Show the names of all files in a directory |
dfOrder
Sorts a data frame
vJoin
Combine two matrices or data frames into one based upon variable labels
combineMatrices
Takes a square matrix (x) and combines with a rectangular matrix y to produce a larger xy matrix.
File input/output functions
read.clipboard |
Shortcuts for reading from the clipboard or a file |
read.clipboard.csv | |
read.clipboard.fwf | |
read.clipboard.lower | |
read.clipboard.tab | |
read.clipboard.upper | |
read.file |
Read a file according to its suffix |
read.file.csv | |
read.https | |
write.file |
Write data to a file |
write.file.csv | |
psych::describe(ability)
psych::describe(ability)
Just a wrapper for tools::RdHTML to find a directory (e.g., the Man directory of help files) and convert them to HTML files in a new directory. Useful for adding HTML help files to a local web page.
rd2html(inDir =NULL,outDir=NULL, nfiles=NULL,package="psych",file=NULL)
rd2html(inDir =NULL,outDir=NULL, nfiles=NULL,package="psych",file=NULL)
inDir |
The input directory. If NULL,then a file in a directory will be searched for using file.choose() |
outDir |
Where to write the output files |
nfiles |
If not NULL, then how many files should be written |
package |
name of package |
file |
If specified, just convert this one file to HTML |
Just a wrapper for Rd2HTML calling some file tools. An interesting use of the function is to precheck whether all the help files are syntactically correct.
William Revelle
See Also as filesList
, filesInfo
if(interactive()) { #This is an interactive function whic require interactive input and thus is not given as examples rd2html() }
if(interactive()) { #This is an interactive function whic require interactive input and thus is not given as examples rd2html() }
Input from a variety of sources may be read. Matrices or data.frames may be read from files with suffixes of .txt, .text, .TXT, .dat, .DATA,.data, .csv, .rds, rda, .xpt, XPT, or .sav (i.e., data from SPSS sav files may be read as can files saved by SAS using the .xpt option). Data exported by JMP or EXCEL in the csv format are also able to be read. Fixed Width Files saved in .txt mode may be read if the widths parameter is specified. Files saved with writeRDS have suffixes of .rds or Rds, and are read using readRDS. Files associated with objects with suffixes .rda and .Rda are loaded (following a security prompt). The default values for read.spss are adjusted for more standard input from SPSS files.
Input from the clipboard is easy but a bit obscure, particularly for Mac users. read.clipboard
and its variations are just an easier way to do so. Data may be copied to the clipboard from Excel spreadsheets, csv files, or fixed width formatted files and then into a data.frame. Data may also be read from lower (or upper) triangular matrices and filled out to square matrices. Writing text files may be done using write.file
which
will prompt for a file name (if not given) and then write or save to that file depending upon the suffix (text, txt, or csv will call write.table, R, or r will dput, rda, Rda will save, Rds,rds will saveRDS).
read.file(file=NULL,header=TRUE,use.value.labels=FALSE,to.data.frame=TRUE,sep=",", quote="\"", widths=NULL,f=NULL, filetype=NULL,...) #for .txt, .text, TXT, .csv, .sav, .xpt, XPT, R, r, Rds, .rds, or .rda, # .Rda, .RData, .Rdata, .dat and .DAT files read.clipboard(header = TRUE, ...) #assumes headers and tab or space delimited read.clipboard.csv(header=TRUE,sep=',',...) #assumes headers and comma delimited read.clipboard.tab(header=TRUE,sep='\t',...) #assumes headers and tab delimited #read in a matrix given the lower off diagonal read.clipboard.lower(diag=TRUE,names=FALSE,...) read.clipboard.upper(diag=TRUE,names=FALSE,...) #read in data using a fixed format width (see read.fwf for instructions) read.clipboard.fwf(header=FALSE,widths=rep(1,10),...) read.https(filename,header=TRUE) read.file.csv(file=NULL,header=TRUE,f=NULL,...) #For output: #be sure to specify the file type in name write.file(x,file=NULL,row.names=FALSE,f=NULL,...) write.file.csv(x,file=NULL,row.names=FALSE,f=NULL,...)
read.file(file=NULL,header=TRUE,use.value.labels=FALSE,to.data.frame=TRUE,sep=",", quote="\"", widths=NULL,f=NULL, filetype=NULL,...) #for .txt, .text, TXT, .csv, .sav, .xpt, XPT, R, r, Rds, .rds, or .rda, # .Rda, .RData, .Rdata, .dat and .DAT files read.clipboard(header = TRUE, ...) #assumes headers and tab or space delimited read.clipboard.csv(header=TRUE,sep=',',...) #assumes headers and comma delimited read.clipboard.tab(header=TRUE,sep='\t',...) #assumes headers and tab delimited #read in a matrix given the lower off diagonal read.clipboard.lower(diag=TRUE,names=FALSE,...) read.clipboard.upper(diag=TRUE,names=FALSE,...) #read in data using a fixed format width (see read.fwf for instructions) read.clipboard.fwf(header=FALSE,widths=rep(1,10),...) read.https(filename,header=TRUE) read.file.csv(file=NULL,header=TRUE,f=NULL,...) #For output: #be sure to specify the file type in name write.file(x,file=NULL,row.names=FALSE,f=NULL,...) write.file.csv(x,file=NULL,row.names=FALSE,f=NULL,...)
header |
Does the first row have variable labels (generally assumed to be TRUE). |
sep |
What is the designated separater between data fields? For typical csv files, this will be a comma, but if commas designate decimals, then a ; can be used to designate different records. |
quote |
Specified to |
diag |
for upper or lower triangular matrices, is the diagonal specified or not |
names |
for read.clipboard.lower or upper, are colnames in the the first column |
widths |
how wide are the columns in fixed width input. The default is to read 10 columns of size 1. |
filename |
Name or address of remote https file to read. |
... |
Other parameters to pass to read |
f |
A file name to read from or write to. If omitted, |
file |
A file name to read from or write to. (same as f, but perhaps more intuitive). If omitted and if f is omitted,then |
x |
The data frame or matrix to write to f |
row.names |
Should the output file include the rownames? By default, no. |
to.data.frame |
Should the spss input be converted to a data frame? |
use.value.labels |
Should the SPSS input values be converted to numeric? |
filetype |
If specified the reading will use this term rather than the suffix. |
A typical session of R might involve data stored in text files, generated online, etc. Although it is easy to just read from a file (particularly if using read.file
), an alternative is to use one's local system to copy from the file to the clipboard and then read from the clipboard using read.clipboard
. This is very convenient (and somewhat more intuitive to the naive user). This is particularly useful when copying from a text book or article and just moving a section of text into R. However, copying from a file and then reading the clipboard is hard to automate in a script. Thus, read.file
will read from a file.
The read.file
function combines the file.choose
and either read.table
, read.fwf
, read.spss
or read.xport
(from foreign) or load
or readRDS
commands. By examining the file suffix, it chooses the appropriate way to read the file. For more complicated file structures, see the foreign package. For even more complicated file structures, see the rio or haven packages.
Note that read.file
assumes by default that the first row has column labels (header =TRUE). If this is not true, then make sure to specify header = FALSE. If the file is fixed width, the assumption is that it does not have a header field. In the unlikely case that a fwf file does have a header, then you probably should try fn <- file.choose() and then my.data <- read.fwf(fn,header=TRUE,widths= widths).
Further note: If the file is a .Rda, .rda, etc. file, the read.file command will return the name and location of the file. It will prompt the user to load this file. In this case, it is necessary to either assign the output (the file name) to an object that has a different name than any of the objects in the file, or to call read.file() without any specification. Notice that loading an .Rda file can overwrite existing objects. Thus the warning and the need to do the second step.
If the file has no suffix the default action is to quit with a warning. However, if the filetype is specified, it will use that type in the reading (e.g. filetype="txt" will read as text file, even if there is no suffix).
If the file is specified and has a prefix of http:// or https:// it will be downloaded and then read.
Currently supported input formats are
.sav | SPSS.sav files |
.csv | A comma separated file (e.g. from Excel or Qualtrics) |
.txt | A typical text file |
.TXT | A typical text file |
.text | A typical text file |
.data | A data file |
.dat | A data file |
.rds | A R data file |
.Rds | A R data file (created by a write) |
.Rda | A R data structure (created using save) |
.rda | A R data structure (created using save) |
.RData | A R data structure (created using save) |
.rdata | A R data structure (created using save) |
.R | A R data structure created using dput |
.r | A R data structure created using dput |
.xpt | A SAS data file in xport format |
.XPT | A SAS data file in XPORT format |
Some data files have an extra ' in the data ( e.g. the NYT covid data base). These files can be read specifying quote ""
The foreign function read.spss
is used to read SPSS .sav files using the most common options. Just as read.spss
issues various warnings, so does read.file
. In general, these can be ignored. For more detailed information about using read.spss
, see the help pages in the foreign package.
If you have a file written by JMP, you must first export to a csv or text file.
The write.file
function combines the file.choose
and either write.table
or saveRDS
. By examining the file suffix, it chooses the appropriate way to write. For more complicated file structures, see the foreign package, or the save function in R Base. If no suffix is added, it will write as a .txt file. write.file.csv
will write in csv format to an arbitrary file name.
Currently supported output formats are
.csv | A comma separated file (e.g. for reading into Excel) |
.txt | A typical text file |
.text | A typical text file |
.rds | A R data file |
.Rds | A R data file (created by a write) |
.Rda | A R data structure (created using save) |
.rda | A R data structure (created using save) |
.R | A R data structure created using dput |
.r | A R data structure created using dput |
Many Excel based files specify missing values as a blank field. When reading from the clipboard, using read.clipboard.tab
will change these blank fields to NA.
Sometimes missing values are specified as "." or "999", or some other values. These can be converted by the read.file command specifying what values are missing (e.g., na ="."). See the example for the reading from the remote mtcars.csv file.
read.clipboard
was based upon a suggestion by Ken Knoblauch to the R-help listserve.
If the input file that was copied into the clipboard was an Excel file with blanks for missing data, then read.clipboard.tab() will correctly replace the blanks with NAs. Similarly for a csv file with blank entries, read.clipboard.csv will replace empty fields with NA.
read.clipboard.lower
and read.clipboard.upper
are adapted from John Fox's read.moments function in the sem package. They will read a lower (or upper) triangular matrix from the clipboard and return a full, symmetric matrix for use by factanal, fa
, ICLUST
, pca
. omega
, etc. If the diagonal is false, it will be replaced by 1.0s. These two function were added to allow easy reading of examples from various texts and manuscripts with just triangular output.
Many articles will report lower triangular matrices with variable labels in the first column. read.clipboard.lower will handle this case. Names must be in the first column if names=TRUE is specified.
Other articles will report upper triangular matrices with variable labels in the first row. read.clipboard.upper will handle this. Note that labels in the first column will not work for read.clipboard.upper. The names, if present, must be in the first row.
Consider the following lower triangular matrix. To read it, copy it to the clipboard and read.clipboard.lower(names=TRUE)
A1 1.00 |
A2 -0.34 1.00 |
A3 -0.27 0.49 1.00 |
A4 -0.15 0.34 0.36 1.00 |
A5 -0.18 0.39 0.50 0.31 1.00 |
C1 0.03 0.09 0.10 0.09 0.12 1.00 |
However, if the data are strung out e.g.,
-.34 |
-.27 |
-.15 |
-.18 |
.03 |
.49 |
.34 |
.39 |
.09 |
.36 |
.50 |
.10 |
.31 |
.09 |
.12 |
Then one needs to read it using the read.clipboard.upper(names=FALSE,diag=FALSE) option.
read.clipboard.fwf will read fixed format files from the clipboard. It includes a patch to read.fwf which will not read from the clipboard or from remote file. See read.fwf for documentation of how to specify the widths.
The contents of the file to be read or of the clipboard. Saved as a data.frame.
William Revelle
#All of these functions are meant for interactive Input #Because these are dynamic functions, they need to be run interactively and # can not be run as examples. #Thus they are not to be tested by CRAN if(interactive()) { my.data <- read.file() #search the directory for a file and then read it. #return the result into an object #or, if the file is a rda, etc. file my.data <- read.file() #return the path and instructions of how to load # without assigning a value. filesList() #search the system for a particular file and then list all the files in that directory fileCreate() #search for a particular directory and create a file there. write.file(Thurstone) #open the search window, choose a location and name the output file, # write the data file (e.g., Thurstone ) to the file chosen #the example data set from read.delim in the readr package to read a remote csv file my.data <-read.file( "https://github.com/tidyverse/readr/raw/master/inst/extdata/mtcars.csv", na=".") #the na option is used for an example, but is not needed for these data #These functions read from the local clipboard and thus are interactive my.data <- read.clipboard() #space delimited columns my.data <- read.clipboard.csv() # , delimited columns my.data <- read.clipboard.tab() #typical input if copied from a spreadsheet my.data <- read.clipboad(header=FALSE) #data start on line 1 my.matrix <- read.clipboard.lower() }
#All of these functions are meant for interactive Input #Because these are dynamic functions, they need to be run interactively and # can not be run as examples. #Thus they are not to be tested by CRAN if(interactive()) { my.data <- read.file() #search the directory for a file and then read it. #return the result into an object #or, if the file is a rda, etc. file my.data <- read.file() #return the path and instructions of how to load # without assigning a value. filesList() #search the system for a particular file and then list all the files in that directory fileCreate() #search for a particular directory and create a file there. write.file(Thurstone) #open the search window, choose a location and name the output file, # write the data file (e.g., Thurstone ) to the file chosen #the example data set from read.delim in the readr package to read a remote csv file my.data <-read.file( "https://github.com/tidyverse/readr/raw/master/inst/extdata/mtcars.csv", na=".") #the na option is used for an example, but is not needed for these data #These functions read from the local clipboard and thus are interactive my.data <- read.clipboard() #space delimited columns my.data <- read.clipboard.csv() # , delimited columns my.data <- read.clipboard.tab() #typical input if copied from a spreadsheet my.data <- read.clipboad(header=FALSE) #data start on line 1 my.matrix <- read.clipboard.lower() }
Given a set of numeric codes, change their values to different values given a mapping function. Also included are the ability to reorder columns or to convert wide sets of columns to long form
rearrange(x,pattern) #reorder the variables wide2long(x,width, cname=NULL, idname = NULL, idvalues=NULL ,pattern=NULL) recode(x, where, isvalue, newvalue) #recode text values to numeric values
rearrange(x,pattern) #reorder the variables wide2long(x,width, cname=NULL, idname = NULL, idvalues=NULL ,pattern=NULL) recode(x, where, isvalue, newvalue) #recode text values to numeric values
x |
A matrix or data frame of numeric values |
where |
The column numbers to fix |
isvalue |
A vector of values to change |
newvalue |
A vector of the new values |
pattern |
column order of repeating patterns |
width |
width of long format |
cname |
Variable names of long format |
idname |
Name of first column |
idvalues |
Values to fill first column |
Three functions for basic recoding are included.
recode: Sometime, data are entered as levels in an incorrect order. Once converted to numeric values, this can lead to confusion. recoding of the data to the correct order is straightforward, if tedious.
rearrange: Another tedious problem is when the output of one function needs to be arranged for better data handling in subsequent function. Specify a pattern of choosing the new columns.
wide2long: And then, having rearranged the data, perhaps convert the file to long format.
The reordered data
Although perhaps useful, the recode function is definitely ugly code. For smaller data sets, the results from char2numeric back to the original will not work. char2numeric works column wise and orders the data in each column.
William Revelle
mlArrange in the psych package for a more general version of wide2long
x <- matrix(1:120,ncol=12) new <- rearrange(x,pattern = c(1,4, 7,10)) new long <- wide2long(x,width=3,pattern=c(1,4, 7,10)) #rearrange and then make wide temp <- bfi[1:100,1:5] isvalue <- 1:6 newvalue <- psych::cs(one,two,three,four,five,six) newtemp <- recode(temp,1:5,isvalue,newvalue) newtemp #characters temp.num <- psych::char2numeric(newtemp) #convert to numeric temp.num #notice the numerical values have changed new.temp.num <- recode(temp.num, 1:5, isvalue=c(3,6,5,2,1,4), newvalue=1:6) #note that because char2numeric works column wise, this will fail for small sets
x <- matrix(1:120,ncol=12) new <- rearrange(x,pattern = c(1,4, 7,10)) new long <- wide2long(x,width=3,pattern=c(1,4, 7,10)) #rearrange and then make wide temp <- bfi[1:100,1:5] isvalue <- 1:6 newvalue <- psych::cs(one,two,three,four,five,six) newtemp <- recode(temp,1:5,isvalue,newvalue) newtemp #characters temp.num <- psych::char2numeric(newtemp) #convert to numeric temp.num #notice the numerical values have changed new.temp.num <- recode(temp.num, 1:5, isvalue=c(3,6,5,2,1,4), newvalue=1:6) #note that because char2numeric works column wise, this will fail for small sets
State Anxiety was measured two-three times in 11 studies at the Personality-Motivation-Cognition laboratory. Here are item responses for 11 studies (9 repeated twice, 2 repeated three times). In all studies, the first occasion was before a manipulation. In some studies, caffeine, or movies or incentives were then given to some of the participants before the second and third STAI was given. In addition, Trait measures are available and included in the tai data set (3032 subjects).
data(sai) data(tai) data(sai.dictionary)
data(sai) data(tai) data(sai.dictionary)
A data frame with 3032 unique observations on the following 23 variables.
id
a numeric vector
study
a factor with levels ages
cart
fast
fiat
film
flat
home
pat
rob
salt
shed
shop
xray
time
1=First, 2 = Second, 3=third administration
TOD
TOD (time of day 1= 8:50-9:30 am,2 = 1=3 pm, 3= 7:-8pm
drug
drug (placebo (0) vs. caffeine (1))
film
film (1=Frontline (concentration camp), 2 = Halloween 3= National Geographic (control), 4- Parenthood (humor)
anxious
anxious
at.ease
at ease
calm
calm
comfortable
comfortable
confident
confident
content
content
high.strung
high.strung
jittery
jittery
joyful
joyful
nervous
nervous
pleasant
pleasant
rattled
over-excited and rattled
regretful
regretful
relaxed
relaxed
rested
rested
secure
secure
tense
tense
upset
upset
worried
worried
worrying
worrying
The standard experimental study at the Personality, Motivation and Cognition (PMC) laboratory (Revelle and Anderson, 1997) was to administer a number of personality trait and state measures (e.g. the epi
, msq
, msqR
and sai
) to participants before some experimental manipulation of arousal/effort/anxiety. Following the manipulation (with a 30 minute delay if giving caffeine/placebo), some performance task was given, followed once again by measures of state arousal/effort/anxiety.
Here are the item level data on the sai
(state anxiety) and the tai
(trait anxiety). Scores on these scales may be found using the scoring keys. The affect
data set includes pre and post scores for two studies (flat and maps) which manipulated state by using four types of movies.
In addition to being useful for studies of motivational state, these studies provide examples of test-retest and alternate form reliabilities. Given that 10 items overlap with the msqR
data, they also allow for a comparison of immediate duplication of items with 30 minute delays.
Studies CART, FAST, SHED, RAFT, and SHOP were either control groups, or did not experimentally vary arousal/effort/anxiety.
AGES, CITY, EMIT, RIM, SALT, and XRAY were caffeine manipulations between time 1 and 2 (RIM and VALE were repeated day 1 and day 2)
FIAT, FLAT, MAPS, MIXX, and THRU were 1 day studies with film manipulation between time 1 and time 2.
SAM1 and SAM2 were the first and second day of a two day study. The STAI was given once per day. MSQ not MSQR was given.
VALE and PAT were two day studies with the STAI given pre and post on both days
RIM was a two day study with the STAI and MSQ given once per day.
Usually, time of day 1 = 8:50-9am am, and 2 = 7:30 pm, however, in rob, with paid subjects, the times were 0530 and 22:30.
Data collected at the Personality, Motivation, and Cognition Laboratory, Northwestern University, between 1991 and 1999.
Charles D. Spielberger and Richard L. Gorsuch and R. E. Lushene, (1970) Manual for the State-Trait Anxiety Inventory.
Revelle, William and Anderson, Kristen Joan (1997) Personality, motivation and cognitive performance: Final report to the Army Research Institute on contract MDA 903-93-K-0008
Rafaeli, Eshkol and Revelle, William (2006), A premature consensus: Are happiness and sadness truly opposite affects? Motivation and Emotion, 30, 1, 1-12.
Smillie, Luke D. and Cooper, Andrew and Wilt, Joshua and Revelle, William (2012) Do Extraverts Get More Bang for the Buck? Refining the Affective-Reactivity Hypothesis of Extraversion. Journal of Personality and Social Psychology, 103 (2), 206-326.
data(sai) table(sai$study,sai$time) #show the counts for repeated measures #Here are the keys to score the sai total score, positive and negative items sai.keys <- list(sai = c("tense","regretful" , "upset", "worrying", "anxious", "nervous" , "jittery" , "high.strung", "worried" , "rattled","-calm", "-secure","-at.ease","-rested","-comfortable", "-confident" ,"-relaxed" , "-content" , "-joyful", "-pleasant" ) , sai.p = c("calm","at.ease","rested","comfortable", "confident", "secure" ,"relaxed" , "content" , "joyful", "pleasant" ), sai.n = c( "tense" , "anxious", "nervous" , "jittery" , "rattled", "high.strung", "upset", "worrying","worried","regretful" ) ) tai.keys <- list(tai=c("-pleasant" ,"nervous" , "not.satisfied", "wish.happy", "failure","-rested", "-calm", "difficulties" , "worry" , "-happy" , "disturbing.thoughts","lack.self.confidence", "-secure", "decisive" , "inadequate","-content","thoughts.bother","disappointments" , "-steady" , "tension" ), tai.pos = c("pleasant", "-wish.happy", "rested","calm","happy" ,"secure", "content","steady" ), tai.neg = c("nervous", "not.satisfied", "failure","difficulties", "worry", "disturbing.thoughts" ,"lack.self.confidence","decisive","inadequate" , "thoughts.bother","disappointments","tension" ) ) #using the is.element function instead of the %in% function #just get the control subjects control <- subset(sai,is.element(sai$study,c("Cart", "Fast", "SHED", "RAFT", "SHOP")) ) #pre and post drug studies drug <- subset(sai,is.element(sai$study, c("AGES","CITY","EMIT","SALT","VALE","XRAY"))) #pre and post film studies film <- subset(sai,is.element(sai$study, c("FIAT","FLAT", "MAPS", "MIXX") )) #this next set allows us to score those sai items that overlap with the msq item sets msq.items <- c("anxious", "at.ease" ,"calm", "confident","content", "jittery", "nervous" , "relaxed" , "tense" , "upset" ) #these overlap with the msq sai.msq.keys <- list(pos =c( "at.ease" , "calm" , "confident", "content","relaxed"), neg = c("anxious", "jittery", "nervous" ,"tense" , "upset"), anx = c("anxious", "jittery", "nervous" ,"tense", "upset","-at.ease" , "-calm" , "-confident", "-content","-relaxed")) sai.not.msq.keys <- list(pos=c( "secure","rested","comfortable" ,"joyful" , "pleasant" ), neg=c("regretful","worrying", "high.strung","worried", "rattled" ), anx = c("regretful","worrying", "high.strung","worried", "rattled", "-secure", "-rested", "-comfortable", "-joyful", "-pleasant" )) sai.alternate.forms <- list( pos1 =c( "at.ease","calm","confident","content","relaxed"), neg1 = c("anxious", "jittery", "nervous" ,"tense" , "upset"), anx1 = c("anxious", "jittery", "nervous" ,"tense", "upset","-at.ease" , "-calm" , "-confident", "-content","-relaxed"), pos2=c( "secure","rested","comfortable" ,"joyful" , "pleasant" ), neg2=c("regretful","worrying", "high.strung","worried", "rattled" ), anx2 = c("regretful","worrying", "high.strung","worried", "rattled", "-secure", "-rested", "-comfortable", "-joyful", "-pleasant" )) sai.repeated <- c("AGES","Cart","Fast","FIAT","FILM","FLAT","HOME","PAT","RIM","SALT", "SAM","SHED","SHOP","VALE","XRAY") sai12 <- subset(sai,is.element(sai$study, sai.repeated)) #the subset with repeated measures #Choose those studies with repeated measures by : sai.control <- subset(sai,is.element(sai$study, c("Cart", "Fast", "SHED", "SHOP"))) sai.film <- subset(sai,is.element(sai$study, c("FIAT","FLAT") ) ) sai.drug <- subset(sai,is.element(sai$study, c("AGES", "SALT", "VALE", "XRAY"))) sai.day <- subset(sai,is.element(sai$study, c("SAM", "RIM")))
data(sai) table(sai$study,sai$time) #show the counts for repeated measures #Here are the keys to score the sai total score, positive and negative items sai.keys <- list(sai = c("tense","regretful" , "upset", "worrying", "anxious", "nervous" , "jittery" , "high.strung", "worried" , "rattled","-calm", "-secure","-at.ease","-rested","-comfortable", "-confident" ,"-relaxed" , "-content" , "-joyful", "-pleasant" ) , sai.p = c("calm","at.ease","rested","comfortable", "confident", "secure" ,"relaxed" , "content" , "joyful", "pleasant" ), sai.n = c( "tense" , "anxious", "nervous" , "jittery" , "rattled", "high.strung", "upset", "worrying","worried","regretful" ) ) tai.keys <- list(tai=c("-pleasant" ,"nervous" , "not.satisfied", "wish.happy", "failure","-rested", "-calm", "difficulties" , "worry" , "-happy" , "disturbing.thoughts","lack.self.confidence", "-secure", "decisive" , "inadequate","-content","thoughts.bother","disappointments" , "-steady" , "tension" ), tai.pos = c("pleasant", "-wish.happy", "rested","calm","happy" ,"secure", "content","steady" ), tai.neg = c("nervous", "not.satisfied", "failure","difficulties", "worry", "disturbing.thoughts" ,"lack.self.confidence","decisive","inadequate" , "thoughts.bother","disappointments","tension" ) ) #using the is.element function instead of the %in% function #just get the control subjects control <- subset(sai,is.element(sai$study,c("Cart", "Fast", "SHED", "RAFT", "SHOP")) ) #pre and post drug studies drug <- subset(sai,is.element(sai$study, c("AGES","CITY","EMIT","SALT","VALE","XRAY"))) #pre and post film studies film <- subset(sai,is.element(sai$study, c("FIAT","FLAT", "MAPS", "MIXX") )) #this next set allows us to score those sai items that overlap with the msq item sets msq.items <- c("anxious", "at.ease" ,"calm", "confident","content", "jittery", "nervous" , "relaxed" , "tense" , "upset" ) #these overlap with the msq sai.msq.keys <- list(pos =c( "at.ease" , "calm" , "confident", "content","relaxed"), neg = c("anxious", "jittery", "nervous" ,"tense" , "upset"), anx = c("anxious", "jittery", "nervous" ,"tense", "upset","-at.ease" , "-calm" , "-confident", "-content","-relaxed")) sai.not.msq.keys <- list(pos=c( "secure","rested","comfortable" ,"joyful" , "pleasant" ), neg=c("regretful","worrying", "high.strung","worried", "rattled" ), anx = c("regretful","worrying", "high.strung","worried", "rattled", "-secure", "-rested", "-comfortable", "-joyful", "-pleasant" )) sai.alternate.forms <- list( pos1 =c( "at.ease","calm","confident","content","relaxed"), neg1 = c("anxious", "jittery", "nervous" ,"tense" , "upset"), anx1 = c("anxious", "jittery", "nervous" ,"tense", "upset","-at.ease" , "-calm" , "-confident", "-content","-relaxed"), pos2=c( "secure","rested","comfortable" ,"joyful" , "pleasant" ), neg2=c("regretful","worrying", "high.strung","worried", "rattled" ), anx2 = c("regretful","worrying", "high.strung","worried", "rattled", "-secure", "-rested", "-comfortable", "-joyful", "-pleasant" )) sai.repeated <- c("AGES","Cart","Fast","FIAT","FILM","FLAT","HOME","PAT","RIM","SALT", "SAM","SHED","SHOP","VALE","XRAY") sai12 <- subset(sai,is.element(sai$study, sai.repeated)) #the subset with repeated measures #Choose those studies with repeated measures by : sai.control <- subset(sai,is.element(sai$study, c("Cart", "Fast", "SHED", "SHOP"))) sai.film <- subset(sai,is.element(sai$study, c("FIAT","FLAT") ) ) sai.drug <- subset(sai,is.element(sai$study, c("AGES", "SALT", "VALE", "XRAY"))) sai.day <- subset(sai,is.element(sai$study, c("SAM", "RIM")))
Four predictors of academic salary are used as examples in Cohen, Cohen, Aiken, and West (2003) may be used for demonstration purposes of multiple regression and multiple correlation.
data("salary")
data("salary")
A data frame with 62 observations on the following 5 variables.
time
Time since Ph.D.
publications
Number of publications
female
gender Male=0, Female =1
citations
Number of citations
salary
Salary
Two extended examples multiple regression in CCAW are discussed in Chapter 3.
These are nice examples of the use of the link{psych::lmCor}
and link{psych::partial.r}
functions.
Note that example data set in Table 3.2.1 (p 67) is just the first 15 cases of the complete data set used in Table 3.5.1 (page 81) and included in this data set.
CD accompanying Cohen, Cohen, Aiken and West (2003) (used with the kind permission of Leona Aiken and Steven West)
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Lawrence Erlbaum Associates Publishers.
data(salary) psych::describe(salary) psych::pairs.panels(salary) #the standardized coefficients psych::lmCor(salary ~ time + publications, data=salary) #or the raw coefficients mod <- psych::lmCor(salary ~ time + publications, data=salary, std=FALSE) mod #show the part correlations psych::partial.r(salary ~ time - publications, data=salary, part=TRUE) psych::partial.r(salary ~ -time + publications, data=salary, part=TRUE) #show the predicted salaries based upon the model mod <- psych::lmCor(salary ~ time + publications+ citations + female, data=salary, std=FALSE) predicted.salary <- psych::predict.psych(mod,salary) head(predicted.salary)#compare to CCAW p 81 ##
data(salary) psych::describe(salary) psych::pairs.panels(salary) #the standardized coefficients psych::lmCor(salary ~ time + publications, data=salary) #or the raw coefficients mod <- psych::lmCor(salary ~ time + publications, data=salary, std=FALSE) mod #show the part correlations psych::partial.r(salary ~ time - publications, data=salary, part=TRUE) psych::partial.r(salary ~ -time + publications, data=salary, part=TRUE) #show the predicted salaries based upon the model mod <- psych::lmCor(salary ~ time + publications+ citations + female, data=salary, std=FALSE) predicted.salary <- psych::predict.psych(mod,salary) head(predicted.salary)#compare to CCAW p 81 ##
Self reported scores on the SAT Verbal, SAT Quantitative and ACT were collected as part of the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project. Age, gender, and education are also reported. The data from 700 subjects are included here as a demonstration set for correlation and analysis.
data(sat.act)
data(sat.act)
A data frame with 700 observations on the following 6 variables.
gender
males = 1, females = 2
education
self reported education 1 = high school ... 5 = graduate work
age
age
ACT
ACT composite scores may range from 1 - 36. National norms have a mean of 20.
SATV
SAT Verbal scores may range from 200 - 800.
SATQ
SAT Quantitative scores may range from 200 - 800
hese items were collected as part of the SAPA project (https://www.sapa-project.org/)to develop online measures of ability (Revelle, Wilt and Rosenthal, 2009). The score means are higher than national norms suggesting both self selection for people taking on line personality and ability tests and a self reporting bias in scores.
See also the iq.items data set.
https://personality-project.org/
Revelle, William, Wilt, Joshua, and Rosenthal, Allen (2009) Personality and Cognition: The Personality-Cognition Link. In Gruszka, Alexandra and Matthews, Gerald and Szymura, Blazej (Eds.) Handbook of Individual Differences in Cognition: Attention, Memory and Executive Control, Springer.
data(sat.act) psych::describe(sat.act) psych::pairs.panels(sat.act)
data(sat.act) psych::describe(sat.act) psych::pairs.panels(sat.act)
Shapiro and ten Berge use the Schutz correlation matrix as an example for Minimum Rank Factor Analysis. The Schutz data set is also a nice example of how normal minres or maximum likelihood will lead to a Heywood case, but minrank factoring will not.
data("Schutz")
data("Schutz")
The format is: num [1:9, 1:9] 1 0.8 0.28 0.29 0.41 0.38 0.44 0.4 0.41 0.8 ... - attr(*, "dimnames")=List of 2 ..$ :1] "Word meaning" "Odd Words" "Boots" "Hatchets" ... ..$ : chr [1:9] "V1" "V2" "V3" "V4" ...
These are 9 cognitive variables of importance mainly because they are used as an example by Shapiro and ten Berge for their paper on Minimum Rank Factor Analysis.
The solution from the fa
function with the fm='minrank' option is very close (but not exactly equal) to their solution.
This example is used to show problems with different methods of factoring. Of the various factoring methods, fm = "minres", "uls", or "mle" produce a Heywood case. Minrank, alpha, and pa do not.
See the blant data set for another example of differences across methods.
Richard E. Schutz,(1958) Factorial Validity of the Holzinger-Crowdeer Uni-factor tests. Educational and Psychological Measurement, 48, 873-875.
Alexander Shapiro and Jos M.F. ten Berge (2002) Statistical inference of minimum rank factor analysis. Psychometrika, 67. 70-94
data(Schutz) psych::corPlot(Schutz,numbers=TRUE,upper=FALSE) f4min <- psych::fa(Schutz,4,fm="minrank") #for an example of minimum rank factor Analysis #compare to f4 <- psych::fa(Schutz,4,fm="mle") #for the maximum likelihood solution which has a Heywood case
data(Schutz) psych::corPlot(Schutz,numbers=TRUE,upper=FALSE) f4min <- psych::fa(Schutz,4,fm="minrank") #for an example of minimum rank factor Analysis #compare to f4 <- psych::fa(Schutz,4,fm="mle") #for the maximum likelihood solution which has a Heywood case
Select a subset of a data.frame or matrix for columns meeting specific criteria. Can do logical AND (default) or OR of the resulting search. Columns (variables) are specified by name and the conditions to meet include equality, less than, more than or inequality to a specified set of values. SplitBy creates new dichotomous variables based on the splitting criteria.
selectBy(x, by) splitBy(x, by, new=FALSE)
selectBy(x, by) splitBy(x, by, new=FALSE)
x |
A data frame or matrix |
by |
A quote delimited string of variables and criteria values. Multiple variables may be separated by commas (default to AND) |
new |
If true, return a new data frame with just the dichotomous variables otherwise concatenate the new variables to the right margin of x |
Two relatively trivial functions to help those less familiar with the subset function or how to use [] to select variables.
The subset of the original data.frame with just the cases that meet the criteria (selectBy) or new variables, recoded 0,1
selectBy
is equivalent to subsetting x by an x value: small <- x[x[by=criterion]] or the subset function small <- subset(x, x$variable == value)
William Revelle
vJoin
for another data manipulation function.
testand <- selectBy(attitude, 'rating < 70 & complaints > 60') #AND dim(testand) testor <- selectBy(attitude, 'rating < 60 | complaints > 60') #OR dim(testor) test <- splitBy(attitude, 'rating > 70 , complaints > 60') psych::headTail(test)
testand <- selectBy(attitude, 'rating < 70 & complaints > 60') #AND dim(testand) testor <- selectBy(attitude, 'rating < 60 | complaints > 60') #OR dim(testor) test <- splitBy(attitude, 'rating > 70 , complaints > 60') psych::headTail(test)
Project Talent gave 440,000 US high school students a number of personality and ability tests. Of these, the data fror 346,000 were available for followup. Subsequent followups were collected 11 and 50 years later. Marion Spengler and her colleagues Rodica Damian, and Brent Roberts reported on the stability and change across 50 years of personality and ability. Here is the correlation matrix of 25 of their variables (Spengler) as well as a slightly different set of 19 variables (Damian). This is a nice example of mediation and regression from a correlation matrix.
data("Damian")
data("Damian")
A 25 x 25 correlation matrix of demographic, personality, and ability variables, based upon 346,660 participants.
Race/Ethnicity
1 = other, 2 = white/caucasian
Sex
1=Male, 2=Female
Age
Cohort =9th grade, 10th grade, 11th grade, 12th grade
Parental
Parental SES based upon 9 questions of home value, family income, etc.
IQ
Standardized composite of Verbal, Spatial and Mathematical
Sociability etc.
10 scales based upon prior work by Damian and Roberts
Maturity
A higher order factor from the prior 10 scales
Extraversion
The second higher order factor
Interest
Self reported interest in school
Reading
Self report reading skills
Writing
Self report writing skills
Responsible
Self reported responsibility scale
Ed.11
Education level at 11 year followup
Educ.50
Education level at 50 year followup
OccPres.11
Occupational Prestige at 11 year followup
OccPres.50
Occupational Prestige at 50 year followup
Income.11
Income at 11 year followup
Income.50
Income at 50 year followup
Data from Project Talent was collected in 1960 on a representative sample of American high school students. Subsequent follow up 11 and 50 years later are reported by Spengler et al (2018) and others.
Marion Spengler, supplementary material to Damian et al. and Spengler et al.
Rodica Ioana Damian and Marion Spengler and Andreea Sutu and Brent W. Roberts, 2019, Sixteen going on sixty-six: A longitudinal study of personality stability and change across 50 years Journal of Personality and Social Psychology, 117, (3) 274-695.
Marian Spengler and Rodica Ioana Damian and Brent W. Roberts (2018), How you behave in school predicts life success above and beyond family background, broad traits, and cognitive ability Journal of Personality and Social Psychology, 114 (4) 600-636
data(Damian) Spengler.stat #show the basic descriptives of the original data set psych::lowerMat(Spengler[psych::cs(IQ,Parental,Ed.11,OccPres.50), psych::cs(IQ,Parental,Ed.11,OccPres.50)]) psych::setCor(OccPres.50 ~ IQ + Parental + (Ed.11),data=Spengler) #we reduce the number of subjects for faster replication in this example mod <- psych::mediate(OccPres.50 ~ IQ + Parental + (Ed.11),data=Spengler, n.iter=50,n.obs=1000) #for speed summary(mod)
data(Damian) Spengler.stat #show the basic descriptives of the original data set psych::lowerMat(Spengler[psych::cs(IQ,Parental,Ed.11,OccPres.50), psych::cs(IQ,Parental,Ed.11,OccPres.50)]) psych::setCor(OccPres.50 ~ IQ + Parental + (Ed.11),data=Spengler) #we reduce the number of subjects for faster replication in this example mod <- psych::mediate(OccPres.50 ~ IQ + Parental + (Ed.11),data=Spengler, n.iter=50,n.obs=1000) #for speed summary(mod)
The SPI (SAPA Personality Inventory) is a set of 135 items primarily selected from International Personality Item Pool (ipip.ori.org). This is an example data set collected using SAPA procedures the sapa-project.org web site. This data set includes 10 demographic variables as well. The data set with 4000 observations on 145 variables may be used for examples in scale construction and validation, as well as empirical scale construction to predict multiple criteria.
data("spi") data(spi.dictionary) data(spi.keys)
data("spi") data(spi.dictionary) data(spi.keys)
A data frame with 4000 observations on the following 145 variables. (The q numbers are the SAPA item numbers).
age
Age in years from 11 -90
sex
Reported biological sex (coded by X chromosones => 1=Male, 2 = Female)
health
Self rated health 1-5: poor, fair, good, very good, excellent
p1edu
Parent 1 education
p2edu
Parent 2 education
education
Respondents education: less than 12, HS grad, current univ, some univ, associate degree, college degree, in grad/prof, grad/prof degree
wellness
Self rated "wellnes" 1-2
exer
Frequency of exercise: very rarely, < 1/month, < 1/wk, 1 or 2 times/week, 3-5/wk, > 5 times/week
smoke
never, not last year, < 1/month, <1/week, 1-3 days/week, most days, up to 5 x /day, up to 20 x /day, > 20x/day
ER
Emergency room visits none, 1x, 2x, 3 or more times
q_253
see the spi.dictionary for these items (q_253
q_1328
see the dictionary for all items q_1328)
Using the data contributed by about 125,000 visitors to the https://www.SAPA-project.org/ website, David Condon has developed a hierarchical framework for assessing personality at two levels. The higher level has the familiar five factors that have been studied extensively in personality research since the 1980s – Conscientiousness, Agreeableness, Neuroticism, Openness, and Extraversion. The lower level has 27 factors that are considerably more narrow. These were derived based on administrations of about 700 public-domain IPIP items to 3 large samples. Condon describes these scales as being "empirically-derived" because relatively little theory was used to select the number of factors in the hierarchy and the items in the scale for each factor (to be clear, he means relatively little personality theory though he relied on quite a lot of sampling and statistical theory). You can read all about the procedures used to develop this framework in his book/manual. If you would like to reproduce these analyses, you can download the data files from Dataverse (links are also provided in the manual) and compile this script in R (he used knitR). Instructions are provided in the Preface to the manual.
The content of the spi items may be seen by examining the spi.dictionary. Included in the dictionary are the item_id number from the SAPA project, the wording of the item, the source of the item, which Big 5 scale the item marks, and which "Little 27" scale the item marks.
This small subset of the data is provided for demonstration purposes.
https://sapa-project.org/research/SPI/SPIdevelopment.pdf.
Condon, D. (2017) The SAPA Personality Inventory: An empirically-derived, hierarchically-organized self-report personality assessment model (https://psyarxiv.com/sc4p9/)
An analysis using the spi data set and various tools from the psych package may be found at
Revelle, Dworak and Condon, (2021) Exploring the persome: the power of the item in understanding personality structure. Personality and Individual Differences, 169, 1. Doi: 10.1016/j.paid.2020.109905.
data(spi) data(spi.dictionary) psych::bestScales(spi, criteria="health",dictionary=spi.dictionary) sc <- psych::scoreVeryFast(spi.keys,spi) #much faster scoring for just scores sc <- psych::scoreOverlap(spi.keys,spi) #gives the alpha reliabilities and various stats #these are corrected for overlap psych::corPlot(sc$corrected,numbers=TRUE,cex=.4,xlas=2,min.length=6, main="Structure of SPI (Corrected for overlap) disattenuated r above the diagonal)")
data(spi) data(spi.dictionary) psych::bestScales(spi, criteria="health",dictionary=spi.dictionary) sc <- psych::scoreVeryFast(spi.keys,spi) #much faster scoring for just scores sc <- psych::scoreOverlap(spi.keys,spi) #gives the alpha reliabilities and various stats #these are corrected for overlap psych::corPlot(sc$corrected,numbers=TRUE,cex=.4,xlas=2,min.length=6, main="Structure of SPI (Corrected for overlap) disattenuated r above the diagonal)")
The correlation matrix of 17 anthropometric measures from the United States Air Force survey of 2420 airmen. The data are taken from the Anthropometry package and included here as a demonstration of a hierarchical factor structure suitable for analysis by the omega
or omegaSem
.
data("USAF")
data("USAF")
The format is: num [1:17, 1:17] 1 0.1148 -0.0309 -0.028 -0.0908 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:17] "age" "weight" "grip" "height" ... ..$ : chr [1:17] "age" "weight" "grip" "height" ...
The original data were collected by the USAF and reported in Churchill et al, 1977. They are included as a data file of 2420 participants and 202 variables (the first being an id) in the Anthropometry package. The list of variable names may be found in Churchill et al, on pages 99-103.
The three (correlated) factor structure shows a clear height, bulk, and head size structure with an overall general factor (g) which may be interpreted as body size.
The variables included (and their variable numbers in Anthropometry) are:
age | V1 |
weight | V2 |
grip strength | V12 |
height (stature) | V13 |
leg length | V26 |
knee height | V37 |
upper arm | V42 |
thumb tip reach | V47 |
in sleeve | V49 |
chest breadth | V52 |
hip breadth | V55 |
waist circumference | V71 |
thigh circumference | V97 |
scye circumference | V103 |
head circumference | V141 |
bitragion coronal | V145 |
head length | V150 |
glabella to wall | V181 |
external canthus to wall | V183 |
Note that these numbers are equivalant to the numbers in Churchill et al. The numbers in Anthropometry are these + 1.
Guillermo Vinue, Anthropometry: An R Package for Analysis of Anthropometric Data, Journal of Statistical Software, (2017), 77, 6. data set = USAFsurvey
Edmund Churchill, Thomas Churchill, Paul Kikta (1977) The AMRL anthropmetric data bank library, volumes I-V. (Technical report AMRL-TR-77-1) ) https://apps.dtic.mil/dtic/tr/fulltext/u2/a047314.pdf
Guillermo Vinue, Anthropometry: An R Package for Analysis of Anthropometric Data, Journal of Statistical Software, (2017), 77, 6.
data(USAF) psych::corPlot(USAF,xlas=3) psych::omega(USAF[c(4:8,10:19),c(4:8,10:19)]) #just the size variables
data(USAF) psych::corPlot(USAF,xlas=3) psych::omega(USAF[c(4:8,10:19),c(4:8,10:19)]) #just the size variables
Wrappers for dirname, file.choose, readLines. file.create, file.path to be called directly for listing directories, creating files, showing the files in a directory, and listing the content of files in a directory. fileCreate
gives the functionality of file.choose
(new=TRUE). filesList
combines file.choose, dirname, and list.files to show the files in a directory, fileScan
extends this and then returns the first few lines of each readable file
fileScan(f = NULL, nlines = 3, max = NULL, from = 1, filter = NULL) filesList(f=NULL) filesInfo(f=NULL,max=NULL) fileCreate(newName="new.file")
fileScan(f = NULL, nlines = 3, max = NULL, from = 1, filter = NULL) filesList(f=NULL) filesInfo(f=NULL,max=NULL) fileCreate(newName="new.file")
f |
File path to use as base path (will use file.choose() if missing. If f is a directory, will list the files in that directory, if f is a file, will find the directory for that file and then list all of those files.) |
nlines |
How many lines to display |
max |
maximum number of files to display |
from |
First file (number) to display |
filter |
Just display files with "filter" in the name |
newName |
The name of the file to be created. |
Just a collection of simple wrappers to powerful core R functions. Allows the user more direct control of what directory to list, to create a file, or to display the content of files. The functions called include file.choose
, file.path
, file.info
,file.create
, dirname
, and dir.exists
. All of these are very powerful functions, but not easy to call interactively.
fileCreate
will ask to locate a file using file.choose, set the directory to that location, and then prompt to create a file with the new.name. This is a workaround for file.choose(new=TRUE) which only works for Macs not using R.studio.
filesInfo
will interactively search for a file and then list the information (size, date, ownership) of all the files in that directory.
filesList
will interactively search for a file and then list all the files in same directory.
Work arounds for core-R functions for interactive file manipulation
William Revelle
read.file
to read in data from a file or read.clipboard
from the clipboard. dfOrder
to sort data.frames.
if(interactive()) { #all of these require interactive input and thus are not given as examples fileCreate("my.new.file.txt") filesList() #show the items in the directory where a file is displayed fileScan() #show the content of the files in a directory #or, if you have a file in mind f <- file.choose() #go find it filesList(f) fileScan(f) }
if(interactive()) { #all of these require interactive input and thus are not given as examples fileCreate("my.new.file.txt") filesList() #show the items in the directory where a file is displayed fileScan() #show the content of the files in a directory #or, if you have a file in mind f <- file.choose() #go find it filesList(f) fileScan(f) }
A classic data set for demonstrating Thurstonian scaling is the preference matrix of 9 vegetables from Guilford (1954). Used by Guiford, Nunnally, and Nunally and Bernstein, this data set allows for examples of basic scaling techniques.
data(vegetables)
data(vegetables)
A data frame with 9 choices on the following 9 vegetables. The values reflect the perecentage of times where the column entry was preferred over the row entry.
Turn
Turnips
Cab
Cabbage
Beet
Beets
Asp
Asparagus
Car
Carrots
Spin
Spinach
S.Beans
String Beans
Peas
Peas
Corn
Corn
Louis L. Thurstone was a pioneer in psychometric theory and measurement of attitudes, interests, and abilities. Among his many contributions was a systematic analysis of the process of comparative judgment (thurstone, 1927). He considered the case of asking subjects to successively compare pairs of objects. If the same subject does this repeatedly, or if subjects act as random replicates of each other, their judgments can be thought of as sampled from a normal distribution of underlying (latent) scale scores for each object, Thurstone proposed that the comparison between the value of two objects could be represented as representing the differences of the average value for each object compared to the standard deviation of the differences between objects. The basic model is that each item has a normal distribution of response strength and that choice represents the stronger of the two response strengths. A justification for the normality assumption is that each decision represents the sum of many independent inputs and thus, through the central limit theorem, is normally distributed.
Thurstone considered five different sets of assumptions about the equality and independence of the variances for each item (Thurston, 1927). Torgerson expanded this analysis slightly by considering three classes of data collection (with individuals, between individuals and mixes of within and between) crossed with three sets of assumptions (equal covariance of decision process, equal correlations and small differences in variance, equal variances).
This vegetable data set is used by Guilford and by Nunnally to demonstrate Thurstonian scaling.
Guilford, J.P. (1954) Psychometric Methods. McGraw-Hill, New York.
Nunnally, J. C. (1967). Psychometric theory., McGraw-Hill, New York.
Revelle, W. An introduction to psychometric theory with applications in R. (in preparation), Springer. https://personality-project.org/r/book/
data(vegetables) psych::thurstone(veg)
data(vegetables) psych::thurstone(veg)
A typical problem in data analysis is to combine two data sets into one. vJoin will combine two matrices or data.frames into one data.frame. Unique column names from set 1 and set 2 are combined as are unique rows. Column names can differ, as can row names. Will match on rownames or a unique key vector. Basically an extension of rbind and cbind without the requirement of matching column and row names. combineMatrices solves a similar problem for correlation matrices.
vJoin(x, y, rnames = TRUE, cnames=TRUE, key.name= NULL) combineMatrices(x,y, r=NULL)
vJoin(x, y, rnames = TRUE, cnames=TRUE, key.name= NULL) combineMatrices(x,y, r=NULL)
x |
a matrix or data frame with column and row names. |
y |
a matrix or data frame with column and row names |
rnames |
If TRUE, the default, match on row names, extend to new names. If FALSE then add the y data following the x data. |
cnames |
If TRUE colnames are NULL then create unique colnames for x and y |
key.name |
if NULL, match on rownames, otherwise, match on the values of the key.name column – must be unique |
r |
shoule we add the diagonal of y? |
For an X and Y matrices/data.frames with column and row names, combine the two data sets. Match on column and row names if they exist, extend to unique names if they do not match. Can also match on a column in each set (key.name)
Matrices by default do not have column or rownames. They will be created for x and for y (depending upon the rnames and cnames options).
combineMatrices takes a square matrix (x) and combines with a rectangular matrix y to produce a larger xy matrix.
xy: a data frame
Inspired by the functionality of full_join and the other related dplyr functions.
William Revelle
X1 <- bfi[1:10,1:5] Y1 <- bfi[6:15,4:10] xy <- vJoin(X1,Y1) #match on rownames xy1 <- vJoin(X1,Y1,rnames=FALSE) #add Y1 items after X1 items x <- matrix(1:30, ncol=5) y <- matrix(1:40, ncol=8) vJoin(x,y) vJoin(x,y,cnames=FALSE) vJoin(x,y, rnames= FALSE, cnames=FALSE) R <- cor(sat.act,use="pairwise") r1 <- R[1:4,1:4] r2 <- R[1:4,5:6] newr <- combineMatrices(r1,r2)
X1 <- bfi[1:10,1:5] Y1 <- bfi[6:15,4:10] xy <- vJoin(X1,Y1) #match on rownames xy1 <- vJoin(X1,Y1,rnames=FALSE) #add Y1 items after X1 items x <- matrix(1:30, ncol=5) y <- matrix(1:40, ncol=8) vJoin(x,y) vJoin(x,y,cnames=FALSE) vJoin(x,y, rnames= FALSE, cnames=FALSE) R <- cor(sat.act,use="pairwise") r1 <- R[1:4,1:4] r2 <- R[1:4,5:6] newr <- combineMatrices(r1,r2)
Zola et al., (2021) reported the validity of self report personality items from the SAPA personality inventory (SPI) (Condon, 2018) in terms of 30 peer reports on 8 dimensions. Here are the polychoric correlations of these items. spi items were collected using SAPA procedures for 158,631 participants (mean n/item = 18,180), 908 of whom received peer ratings.
data("zola")
data("zola")
The format is: num [1:165, 1:165] 1 -0.242 0.282 0.65 0.223 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:165] "q_253" "q_4296" "q_1855" "q_90" ... ..$ : chr [1:165] "q_253" "q_4296" "q_1855" "q_90" ...
The polychoric correlation matrix of the spi and peer report data. To see the item labels, use the lookupFromKeys
.
This data set is a nice example of a multi-trait, multi-method correlation matrix. (see the scoring example). Five dimensions of self report show high correlations with the corresonding peer report scales.
A. Zola, D.M. Condon, and W. Revelle, (2021)
A. Zola, D.M. Condon, and W. Revelle, (2021) The Convergence of Self and Informant Reports in a Large Online Sample, Collabra: Psychology, 7, 1. doi: 10.1525/collabra.25983
data(zola) psych::lookupFromKeys(zola.keys,zola.dictionary) scores <- psych::scoreOverlap(zola.keys[c(1:5,33:37)],zola) #MTMM of Big 5 scores
data(zola) psych::lookupFromKeys(zola.keys,zola.dictionary) scores <- psych::scoreOverlap(zola.keys[c(1:5,33:37)],zola) #MTMM of Big 5 scores