Title: | Various Methods for the Two Sample Problem |
---|---|
Description: | The routine twosample_test() in this package runs the two sample test using various test statistic. The p values are found via permutation or large sample theory. The routine twosample_power() allows the calculation of the power in various cases, and plot_power() draws the corresponding power graphs. |
Authors: | Wolfgang Rolke [aut, cre] |
Maintainer: | Wolfgang Rolke <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.2.0 |
Built: | 2024-10-22 07:22:29 UTC |
Source: | CRAN |
This function finds the p values of several tests based on large sample theory
asymptotic_pvalues(x, n, m)
asymptotic_pvalues(x, n, m)
x |
a vector of test statistics |
n |
size of sample 1 |
m |
size of sample 2 |
A vector of p values.
This function runs the chi-square test for continuous or discrete data
chi_power( rxy, alpha = 0.05, B = 1000, xparam, yparam, nbins = c(50, 10), minexpcount = 5, typeTS )
chi_power( rxy, alpha = 0.05, B = 1000, xparam, yparam, nbins = c(50, 10), minexpcount = 5, typeTS )
rxy |
a function to generate data |
alpha |
=0.05 type I error probability of test |
B |
=1000 number of simulation runs |
xparam |
vector of parameter values |
yparam |
vector of parameter values |
nbins |
=c(50, 10) number of desired bins |
minexpcount |
=5 smallest number of counts required in each bin |
typeTS |
type of problem, continuous/discrete, with/without weights |
A matrix of power values
This function draws the power graph, with curves sorted by the mean power and smoothed for easier reading.
plot_power(pwr, xname = " ", title, Smooth = TRUE, span = 0.25)
plot_power(pwr, xname = " ", title, Smooth = TRUE, span = 0.25)
pwr |
a matrix of power values, usually from the twosample_power command |
xname |
Name of variable on x axis |
title |
(Optional) title of graph |
Smooth |
=TRUE lines are smoothed for easier reading |
span |
=0.25bandwidth of smoothing method |
plt, an object of class ggplot.
Runs the shiny app associated with R2sample package
run_shiny()
run_shiny()
No return value, called for side effect of opening a shiny app
This function does some rounding to nice numbers
## S3 method for class 'digits' signif(x, d = 4)
## S3 method for class 'digits' signif(x, d = 4)
x |
a list of two vectors |
d |
=4 number of digits to round to |
A list with rounded vectors
Find the power of various two sample tests using Rcpp and parallel computing.
twosample_power( f, ..., TS, TSextra, alpha = 0.05, B = c(1000, 1000), nbins = c(50, 10), minexpcount = 5, UseLargeSample, samplingmethod = "independence", maxProcessor = 10 )
twosample_power( f, ..., TS, TSextra, alpha = 0.05, B = c(1000, 1000), nbins = c(50, 10), minexpcount = 5, UseLargeSample, samplingmethod = "independence", maxProcessor = 10 )
f |
function to generate a list with data sets x, y and (optional) vals, weights |
... |
additional arguments passed to f, up to 2 |
TS |
routine to calculate test statistics for non-chi-square tests |
TSextra |
additional info passed to TS, if necessary |
alpha |
=0.05, the level of the hypothesis test |
B |
=c(1000, 2000), number of simulation runs for power and permutation test. |
nbins |
=c(50,10), number of bins for chi large and chi small. |
minexpcount |
=5 minimum required count for chi square tests |
UseLargeSample |
should p values be found via large sample theory if n,m>10000? |
samplingmethod |
=independence or MCMC in discrete data case |
maxProcessor |
=10, maximum number of cores to use. If maxProcessor=1 no parallel computing is used. |
A numeric vector of power values.
f=function(mu) list(x=rnorm(25), y=rnorm(25, mu)) twosample_power(f, mu=c(0,2), B=c(100, 100), maxProcessor = 1) f=function(n, p) list(x=table(sample(1:5, size=1000, replace=TRUE)), y=table(sample(1:5, size=n, replace=TRUE, prob=c(1, 1, 1, 1, p))), vals=1:5) twosample_power(f, n=c(1000, 2000), p=c(1, 1.5), B=c(100, 100), maxProcessor = 1)
f=function(mu) list(x=rnorm(25), y=rnorm(25, mu)) twosample_power(f, mu=c(0,2), B=c(100, 100), maxProcessor = 1) f=function(n, p) list(x=table(sample(1:5, size=1000, replace=TRUE)), y=table(sample(1:5, size=n, replace=TRUE, prob=c(1, 1, 1, 1, p))), vals=1:5) twosample_power(f, n=c(1000, 2000), p=c(1, 1.5), B=c(100, 100), maxProcessor = 1)
This function runs a number of two sample tests using Rcpp and parallel computing.
twosample_test( x, y, vals = NA, TS, TSextra, wx = rep(1, length(x)), wy = rep(1, length(y)), B = 5000, nbins = c(50, 10), maxProcessor, UseLargeSample, samplingmethod = "independence", doMethods = "all" )
twosample_test( x, y, vals = NA, TS, TSextra, wx = rep(1, length(x)), wy = rep(1, length(y)), B = 5000, nbins = c(50, 10), maxProcessor, UseLargeSample, samplingmethod = "independence", doMethods = "all" )
x |
a vector of numbers if data is continuous or of counts if data is discrete. |
y |
a vector of numbers if data is continuous or of counts if data is discrete. |
vals |
=NA, a vector of numbers, the values of a discrete random variable. NA if data is continuous data. |
TS |
routine to calculate test statistics for non-chi-square tests |
TSextra |
additional info passed to TS, if necessary |
wx |
A numeric vector of weights of x. |
wy |
A numeric vector of weights of y. |
B |
=5000, number of simulation runs for permutation test |
nbins |
=c(50,10), number of bins for chi square tests. |
maxProcessor |
maximum number of cores to use. If missing (the default) no parallel processing is used. |
UseLargeSample |
should p values be found via large sample theory if n,m>10000? |
samplingmethod |
="independence" or "MCMC" for discrete data |
doMethods |
="all" Which methods should be included? If missing all methods are used. |
A list of two numeric vectors, the test statistics and the p values.
R2sample::twosample_test(rnorm(1000), rt(1000, 4), B=1000) myTS=function(x,y) {z=c(mean(x)-mean(y),sd(x)-sd(y));names(z)=c("M","S");z} R2sample::twosample_test(rnorm(1000), rt(1000, 4), TS=myTS, B=1000) vals=1:5 x=table(sample(vals, size=100, replace=TRUE)) y=table(sample(vals, size=100, replace=TRUE, prob=c(1,1,3,1,1))) R2sample::twosample_test(x, y, vals)
R2sample::twosample_test(rnorm(1000), rt(1000, 4), B=1000) myTS=function(x,y) {z=c(mean(x)-mean(y),sd(x)-sd(y));names(z)=c("M","S");z} R2sample::twosample_test(rnorm(1000), rt(1000, 4), TS=myTS, B=1000) vals=1:5 x=table(sample(vals, size=100, replace=TRUE)) y=table(sample(vals, size=100, replace=TRUE, prob=c(1,1,3,1,1))) R2sample::twosample_test(x, y, vals)
This function runs a number of two sample tests using Rcpp and parallel computing and then finds the correct p value for the combined tests.
twosample_test_adjusted_pvalue( x, y, vals = NA, TS, TSextra, wx = rep(1, length(x)), wy = rep(1, length(y)), B = c(5000, 1000), nbins = c(50, 10), samplingmethod = "independence", doMethods )
twosample_test_adjusted_pvalue( x, y, vals = NA, TS, TSextra, wx = rep(1, length(x)), wy = rep(1, length(y)), B = c(5000, 1000), nbins = c(50, 10), samplingmethod = "independence", doMethods )
x |
a vector of numbers if data is continuous or of counts if data is discrete. |
y |
a vector of numbers if data is continuous or of counts if data is discrete. |
vals |
=NA, a vector of numbers, the values of a discrete random variable. NA if data is continuous data. |
TS |
routine to calculate test statistics for non-chi-square tests |
TSextra |
additional info passed to TS, if necessary |
wx |
A numeric vector of weights of x. |
wy |
A numeric vector of weights of y. |
B |
=c(5000, 1000), number of simulation runs for permutation test |
nbins |
=c(50,10), number of bins for chi square tests. |
samplingmethod |
="independence" or "MCMC" for discrete data |
doMethods |
Which methods should be included? |
A list of two numeric vectors, the test statistics and the p values.
x=rnorm(100) y=rt(200, 4) R2sample::twosample_test_adjusted_pvalue(x, y, B=c(500, 500)) vals=1:5 x=table(c(1:5, sample(1:5, size=100, replace=TRUE)))-1 y=table(c(1:5, sample(1:5, size=100, replace=TRUE, prob=c(1,1,3,1,1))))-1 R2sample::twosample_test_adjusted_pvalue(x, y, vals, B=c(500, 500))
x=rnorm(100) y=rt(200, 4) R2sample::twosample_test_adjusted_pvalue(x, y, B=c(500, 500)) vals=1:5 x=table(c(1:5, sample(1:5, size=100, replace=TRUE)))-1 y=table(c(1:5, sample(1:5, size=100, replace=TRUE, prob=c(1,1,3,1,1))))-1 R2sample::twosample_test_adjusted_pvalue(x, y, vals, B=c(500, 500))