Title: | Crosstab and Statistical Tests for OII MSc Stats Course |
---|---|
Description: | Provides simple crosstab output with optional statistics (e.g., Goodman-Kruskal Gamma, Somers' d, and Kendall's tau-b) as well as two-way and one-way tables. The package is used within the statistics component of the Masters of Science (MSc) in Social Science of the Internet at the Oxford Internet Institute (OII), University of Oxford, but the functions should be useful for general data analysis and especially for analysis of categorical and ordinal data. |
Authors: | Scott Hale [aut, cre], Jon Bright [aut], Grant Blank [aut] |
Maintainer: | Scott Hale <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.2.1 |
Built: | 2024-12-15 07:43:30 UTC |
Source: | CRAN |
This function calculates basic measures of association
association.measures(x, y = NULL, warnings = FALSE)
association.measures(x, y = NULL, warnings = FALSE)
x |
a table or matrix if |
y |
the column variable, a numeric vector used only when |
warnings |
a logical value indicating whether warnings should be shown (defaults to FALSE, no warnings). |
A list with the following elements is returned:
phi |
Phi, a chi-square-based measures of association. |
contingency_coefficient |
Contingency coefficient, a chi-square-based measures of association. |
cramersv |
Cramer's V, a chi-square-based measures of association. |
pairs_total |
Total number of pairs |
pairs_concordant |
Number of concordant pairs |
pairs_discordant |
Number of discordant pairs |
pairs_tied_first |
The number of pairs tied on the first variable (but not both variables) |
pairs_tied_second |
The number of pairs tied on the second variable (but not both variables) |
pairs_tied_both |
The number of pairs tied on both the first and second variables |
minimum_dim |
Minimum dimension of |
n |
Number of cases |
gamma |
Goodman-Kruskal Gamma |
somersd |
Somers' d (assuming the column variable is the dependent variable) |
taub |
Kendall's tau-b |
tauc |
Stuart's tau-c |
oii.xtab
, likelihood.test
, lambda.test
,
concordant.pairs
, discordant.pairs
, tied.pairs
#Create var1 as 200 A's, B's, and C's var1<-sample(LETTERS[1:3],size=200,replace=TRUE) #Create var2 as 200 numbers in the range 1 to 4 var2<-sample(1:4,size=200,replace=TRUE) #Print a simple cross tab of var1 and var2 association.measures(var1,var2)
#Create var1 as 200 A's, B's, and C's var1<-sample(LETTERS[1:3],size=200,replace=TRUE) #Create var2 as 200 numbers in the range 1 to 4 var2<-sample(1:4,size=200,replace=TRUE) #Print a simple cross tab of var1 and var2 association.measures(var1,var2)
The number of concordant pairs in a table or matrix
concordant.pairs(x, y = NULL)
concordant.pairs(x, y = NULL)
x |
a table or matrix if |
y |
the column variable, a numeric vector used only when |
The number of concordant pairs
association.measures
, discordant.pairs
, tied.pairs
The number of discordant pairs in a table or matrix
discordant.pairs(x, y = NULL)
discordant.pairs(x, y = NULL)
x |
a table or matrix if |
y |
the column variable, a numeric vector used only when |
The number of discordant pairs
association.measures
, concordant.pairs
, tied.pairs
This package provides a few commands that are used within the MSc course at the Oxford Internet Institute, University of Oxford
The only functions you're likely to need from oii are
oii.summary
, oii.freq
, and oii.xtab
.
This function prints a simple frequency table with totals and percentages
oii.freq(x)
oii.freq(x)
x |
input variable, (usually of class |
A data.frame
with one row per each unique value of x
.
These values of x
are assigned to the row.names
of the data.frame.
The data.frame also has rows for:
Valid Total |
The total number of non-missing cases (i.e., |
Missing |
The total number of missing/NA cases (i.e., |
Total |
The total number of cases (i.e., |
The data.frame has the following columns:
freq |
The number of cases with this value |
percent |
The percentage of all cases that this value represents |
valid_percent |
The percentage of all valid (i.e., not missing) cases that this value represents |
cum_percent |
The cumulative percentage of valid cases |
data.frame
, row.names
is.na
, length
, summary
, table
#Create var as 200 A's, B's, and C's var<-sample(LETTERS[1:3],size=200,replace=TRUE) #Generate a frequency table for the counts of A's, B's, and C's oii.freq(var)
#Create var as 200 A's, B's, and C's var<-sample(LETTERS[1:3],size=200,replace=TRUE) #Generate a frequency table for the counts of A's, B's, and C's oii.freq(var)
This function is designed to be like the built-in summary
function but include a few additional values.
If the input is not numeric, the built-in summary command is executed.
oii.summary(x, extended = FALSE, warnings = FALSE)
oii.summary(x, extended = FALSE, warnings = FALSE)
x |
a numeric vector for which summary statistics should be generated. |
extended |
a logical value indicating whether additional statistics should be printed (see Value section). Defaults to FALSE stripped before the computation proceeds (defaults to TRUE). |
warnings |
a logical value indicating whether warnings should be shown (defaults to FALSE, no warnings). |
If x
is not numeric, the built-in summary command is executed.
If x
is numeric (that is, is.numeric(x)
returns TRUE), then a list with the following elements is returned:
cases |
The number of non-missing values in |
na |
The number of missing values in |
mean |
The mean value of |
sd |
The standard deviation for values in |
min |
The minimum/smallest value in |
max |
The maximum/largest value in |
This function also calculates the following statistics, but these are not printed by default unless extended is set to TRUE
var |
The variance of |
median |
The median value of |
p25 |
The 25th percentile of |
p75 |
The 75th percentile of |
skewness |
The skewness coefficient for |
kurtosis |
The kurtosis coefficient for |
summary
, min
, median
, mean
, max
, sd
,
is.na
, is.numeric
, skewness
, kurtosis
, var
#Generate data from a normal distribution with mean 0 and sd 1 #store the result in a variable called tmp tmp<-rnorm(500,mean=0,sd=1) #Print the summary statistics about tmp oii.summary(tmp) #Print even more summary statistics about tmp oii.summary(tmp,extended=TRUE)
#Generate data from a normal distribution with mean 0 and sd 1 #store the result in a variable called tmp tmp<-rnorm(500,mean=0,sd=1) #Print the summary statistics about tmp oii.summary(tmp) #Print even more summary statistics about tmp oii.summary(tmp,extended=TRUE)
This function prints a 2-way table with optional cell statistics and measures of association
oii.xtab(r, c = NULL, s = NULL, row = FALSE, col = FALSE, pctcell = FALSE, stats = FALSE, rescell = FALSE, chistd = FALSE, expcell = FALSE, chicell = FALSE, warnings = FALSE, varnames = NULL, include.missing = FALSE, ...)
oii.xtab(r, c = NULL, s = NULL, row = FALSE, col = FALSE, pctcell = FALSE, stats = FALSE, rescell = FALSE, chistd = FALSE, expcell = FALSE, chicell = FALSE, warnings = FALSE, varnames = NULL, include.missing = FALSE, ...)
r |
the row variable. If |
c |
the column variable. |
s |
the split variable. The |
row |
Show row percentages? Defaults to FALSE. |
col |
Show column percentages? Defaults to FALSE. |
pctcell |
Print cell percentages? Defaults to FALSE. |
stats |
Print measures of association? Defaults to FALSE. This parameter is ignored either |
rescell |
Print residual cell count under the null hypothesis? Defaults to FALSE. |
chistd |
Print cell standardized residuals to pearson chi-square? Defaults to FALSE. |
expcell |
Print expected cell count under the null hypothesis? Defaults to FALSE. |
chicell |
Print cell contribution to pearson chi-square? Defaults to FALSE. |
warnings |
a logical value indicating whether warnings should be shown (defaults to FALSE, no warnings). |
varnames |
Names used to refer to |
include.missing |
Set to TRUE to include factor levels with no instances in the output. Default (FALSE) excludes them. |
... |
Additional parameters to be passed to |
association.measures
, CrossTable
, likelihood.test
, lambda.test
#Create var1 as 200 A's, B's, and C's var1<-sample(LETTERS[1:3],size=200,replace=TRUE) #Create var2 as 200 numbers in the range 1 to 4 var2<-sample(1:4,size=200,replace=TRUE) #Print a simple 2-way table of var1 and var2 oii.xtab(var1,var2) #Print the row and column percents oii.xtab(var1,var2,row=TRUE,col=TRUE) #Print measures of association statistics oii.xtab(var1,var2,stats=TRUE) #If the variables are part of a data.frame my.data.frame<-data.frame(var1,var2) #We can use the $ to get the variables oii.xtab(my.data.frame$var1,my.data.frame$var2) #or use the with(...) command to save some typing with(my.data.frame,oii.xtab(var1,var2)) #Three-way tables are also possible #Create var3 as 200 "I"'s, "II"'s, and "III"'s var3<-sample(c("I","II","III"),size=200,replace=TRUE) oii.xtab(var1,var2,var3) #We can also pass in a data.frame directly as the first argument my.data.frame<-data.frame(var1,var2,var3) oii.xtab(my.data.frame,stats=TRUE) #The variables in the data.frame are used in order; #so, sometimes it is useful to re-order them. For example, oii.xtab(my.data.frame[,c("var3","var1","var2")],stats=TRUE) #Of course, it is also possible to pass in the variables one #at a time or use with(...) as shown above.
#Create var1 as 200 A's, B's, and C's var1<-sample(LETTERS[1:3],size=200,replace=TRUE) #Create var2 as 200 numbers in the range 1 to 4 var2<-sample(1:4,size=200,replace=TRUE) #Print a simple 2-way table of var1 and var2 oii.xtab(var1,var2) #Print the row and column percents oii.xtab(var1,var2,row=TRUE,col=TRUE) #Print measures of association statistics oii.xtab(var1,var2,stats=TRUE) #If the variables are part of a data.frame my.data.frame<-data.frame(var1,var2) #We can use the $ to get the variables oii.xtab(my.data.frame$var1,my.data.frame$var2) #or use the with(...) command to save some typing with(my.data.frame,oii.xtab(var1,var2)) #Three-way tables are also possible #Create var3 as 200 "I"'s, "II"'s, and "III"'s var3<-sample(c("I","II","III"),size=200,replace=TRUE) oii.xtab(var1,var2,var3) #We can also pass in a data.frame directly as the first argument my.data.frame<-data.frame(var1,var2,var3) oii.xtab(my.data.frame,stats=TRUE) #The variables in the data.frame are used in order; #so, sometimes it is useful to re-order them. For example, oii.xtab(my.data.frame[,c("var3","var1","var2")],stats=TRUE) #Of course, it is also possible to pass in the variables one #at a time or use with(...) as shown above.
The number of tied pairs, a measure of association
tied.pairs(x, y = NULL)
tied.pairs(x, y = NULL)
x |
a table or matrix if |
y |
the second variable, a numeric vector used only when |
A list with the following values:
first |
The number of pairs tied on the first variable, but not both variables |
second |
The number of pairs tied on the second variable, but not both variables |
both |
The number of pairs tied on both the first and second variables |
association.measures
, concordant.pairs
, discordant.pairs