Title: | 'SAS' Linear Model |
---|---|
Description: | This is a core implementation of 'SAS' procedures for linear models - GLM, REG, ANOVA, TTEST, FREQ, and UNIVARIATE. Some R packages provide type II and type III SS. However, the results of nested and complex designs are often different from those of 'SAS.' Different results does not necessarily mean incorrectness. However, many wants the same results to SAS. This package aims to achieve that. Reference: Littell RC, Stroup WW, Freund RJ (2002, ISBN:0-471-22174-0). |
Authors: | Kyun-Seop Bae [aut, cre] |
Maintainer: | Kyun-Seop Bae <[email protected]> |
License: | GPL-3 |
Version: | 0.10.5 |
Built: | 2024-12-02 06:54:07 UTC |
Source: | CRAN |
This is a core implementation of 'SAS' procedures for linear models - GLM, REG, and ANOVA. Some packages provide type II and type III SS. However, the results of nested and complex designs are often different from those of 'SAS'. A different result does not necessarily mean incorrectness. However, many want the same result with 'SAS'. This package aims to achieve that. Reference: Littell RC, Stroup WW, Freund RJ (2002, ISBN:0-471-22174-0).
This will serve those who want SAS PROC GLM, REG, and ANOVA in R.
Kyun-Seop Bae [email protected]
## SAS PROC GLM Script for Typical Bioequivalence Data # PROC GLM DATA=BEdata; # CLASS SEQ SUBJ PRD TRT; # MODEL LNCMAX = SEQ SUBJ(SEQ) PRD TRT; # RANDOM SUBJ(SEQ)/TEST; # LSMEANS TRT / DIFF=CONTROL("R") CL ALPHA=0.1; # ODS OUTPUT LSMeanDiffCL=LSMD; # DATA LSMD; SET LSMD; # PE = EXP(DIFFERENCE); # LL = EXP(LowerCL); # UL = EXP(UpperCL); # PROC PRINT DATA=LSMD; RUN; ## ## SAS PROC GLM equivalent BEdata = af(BEdata, c("SEQ", "SUBJ", "PRD", "TRT")) # Columns as factor formula1 = log(CMAX) ~ SEQ/SUBJ + PRD + TRT # Model GLM(formula1, BEdata) # ANOVA tables of Type I, II, III SS RanTest(formula1, BEdata, Random="SUBJ") # Hypothesis test with SUBJ as random ci0 = CIest(formula1, BEdata, "TRT", c(-1, 1), 0.90) # 90$ CI exp(ci0[, c("Estimate", "Lower CL", "Upper CL")]) # 90% CI of GMR ## 'nlme' or SAS PROC MIXED is preferred for an unbalanced case ## SAS PROC MIXED equivalent # require(nlme) # Result = lme(log(CMAX) ~ SEQ + PRD + TRT, random=~1|SUBJ, data=BEdata) # summary(Result) # VarCorr(Result) # ci = intervals(Result, 0.90) ; ci # exp(ci$fixed["TRTT",]) ##
## SAS PROC GLM Script for Typical Bioequivalence Data # PROC GLM DATA=BEdata; # CLASS SEQ SUBJ PRD TRT; # MODEL LNCMAX = SEQ SUBJ(SEQ) PRD TRT; # RANDOM SUBJ(SEQ)/TEST; # LSMEANS TRT / DIFF=CONTROL("R") CL ALPHA=0.1; # ODS OUTPUT LSMeanDiffCL=LSMD; # DATA LSMD; SET LSMD; # PE = EXP(DIFFERENCE); # LL = EXP(LowerCL); # UL = EXP(UpperCL); # PROC PRINT DATA=LSMD; RUN; ## ## SAS PROC GLM equivalent BEdata = af(BEdata, c("SEQ", "SUBJ", "PRD", "TRT")) # Columns as factor formula1 = log(CMAX) ~ SEQ/SUBJ + PRD + TRT # Model GLM(formula1, BEdata) # ANOVA tables of Type I, II, III SS RanTest(formula1, BEdata, Random="SUBJ") # Hypothesis test with SUBJ as random ci0 = CIest(formula1, BEdata, "TRT", c(-1, 1), 0.90) # 90$ CI exp(ci0[, c("Estimate", "Lower CL", "Upper CL")]) # 90% CI of GMR ## 'nlme' or SAS PROC MIXED is preferred for an unbalanced case ## SAS PROC MIXED equivalent # require(nlme) # Result = lme(log(CMAX) ~ SEQ + PRD + TRT, random=~1|SUBJ, data=BEdata) # summary(Result) # VarCorr(Result) # ci = intervals(Result, 0.90) ; ci # exp(ci$fixed["TRTT",]) ##
Conveniently convert some columns of data.frame into factors.
af(DataFrame, Cols)
af(DataFrame, Cols)
DataFrame |
a |
Cols |
column names or indices to be converted |
It performs conversion of some columns in a data.frame
into factors conveniently.
Returns a data.frame
with converted columns.
Kyun-Seop Bae [email protected]
ANOVA with Type I SS.
aov1(Formula, Data, BETA=FALSE, Resid=FALSE)
aov1(Formula, Data, BETA=FALSE, Resid=FALSE)
Formula |
a conventional formula for a linear model. |
Data |
a |
BETA |
if |
Resid |
if |
It performs the core function of SAS PROC GLM, and returns Type I SS. This accepts continuous independent variables also.
The result table is comparable to that of SAS PROC ANOVA.
Df |
degree of freedom |
Sum Sq |
sum of square for the set of contrasts |
Mean Sq |
mean square |
F value |
F value for the F distribution |
Pr(>F) |
proability of larger than F value |
Next returns are optional.
Parameter |
Parameter table with standard error, t value, p value. |
Fitted |
Fitted value or y hat. This is returned only with Resid=TRUE option. |
Residual |
Weigthed residuals. This is returned only with Resid=TRUE option. |
Kyun-Seop Bae [email protected]
aov1(uptake ~ Plant + Type + Treatment + conc, CO2) aov1(uptake ~ Plant + Type + Treatment + conc, CO2, BETA=TRUE) aov1(uptake ~ Plant + Type + Treatment + conc, CO2, Resid=TRUE) aov1(uptake ~ Plant + Type + Treatment + conc, CO2, BETA=TRUE, Resid=TRUE)
aov1(uptake ~ Plant + Type + Treatment + conc, CO2) aov1(uptake ~ Plant + Type + Treatment + conc, CO2, BETA=TRUE) aov1(uptake ~ Plant + Type + Treatment + conc, CO2, Resid=TRUE) aov1(uptake ~ Plant + Type + Treatment + conc, CO2, BETA=TRUE, Resid=TRUE)
ANOVA with Type II SS.
aov2(Formula, Data, BETA=FALSE, Resid=FALSE)
aov2(Formula, Data, BETA=FALSE, Resid=FALSE)
Formula |
a conventional formula for a linear model. |
Data |
a |
BETA |
if |
Resid |
if |
It performs the core function of SAS PROC GLM, and returns Type II SS. This accepts continuous independent variables also.
The result table is comparable to that of SAS PROC ANOVA.
Df |
degree of freedom |
Sum Sq |
sum of square for the set of contrasts |
Mean Sq |
mean square |
F value |
F value for the F distribution |
Pr(>F) |
proability of larger than F value |
Next returns are optional.
Parameter |
Parameter table with standard error, t value, p value. |
Fitted |
Fitted value or y hat. This is returned only with Resid=TRUE option. |
Residual |
Weigthed residuals. This is returned only with Resid=TRUE option. |
Kyun-Seop Bae [email protected]
aov2(uptake ~ Plant + Type + Treatment + conc, CO2) aov2(uptake ~ Plant + Type + Treatment + conc, CO2, BETA=TRUE) aov2(uptake ~ Plant + Type + Treatment + conc, CO2, Resid=TRUE) aov2(uptake ~ Plant + Type + Treatment + conc, CO2, BETA=TRUE, Resid=TRUE) aov2(uptake ~ Type, CO2) aov2(uptake ~ Type - 1, CO2)
aov2(uptake ~ Plant + Type + Treatment + conc, CO2) aov2(uptake ~ Plant + Type + Treatment + conc, CO2, BETA=TRUE) aov2(uptake ~ Plant + Type + Treatment + conc, CO2, Resid=TRUE) aov2(uptake ~ Plant + Type + Treatment + conc, CO2, BETA=TRUE, Resid=TRUE) aov2(uptake ~ Type, CO2) aov2(uptake ~ Type - 1, CO2)
ANOVA with Type III SS.
aov3(Formula, Data, BETA=FALSE, Resid=FALSE)
aov3(Formula, Data, BETA=FALSE, Resid=FALSE)
Formula |
a conventional formula for a linear model. |
Data |
a |
BETA |
if |
Resid |
if |
It performs the core function of SAS PROC GLM, and returns Type III SS. This accepts continuous independent variables also.
The result table is comparable to that of SAS PROC ANOVA.
Df |
degree of freedom |
Sum Sq |
sum of square for the set of contrasts |
Mean Sq |
mean square |
F value |
F value for the F distribution |
Pr(>F) |
proability of larger than F value |
Next returns are optional.
Parameter |
Parameter table with standard error, t value, p value. |
Fitted |
Fitted value or y hat. This is returned only with Resid=TRUE option. |
Residual |
Weigthed residuals. This is returned only with Resid=TRUE option. |
Kyun-Seop Bae [email protected]
aov3(uptake ~ Plant + Type + Treatment + conc, CO2) aov3(uptake ~ Plant + Type + Treatment + conc, CO2, BETA=TRUE) aov3(uptake ~ Plant + Type + Treatment + conc, CO2, Resid=TRUE) aov3(uptake ~ Plant + Type + Treatment + conc, CO2, BETA=TRUE, Resid=TRUE)
aov3(uptake ~ Plant + Type + Treatment + conc, CO2) aov3(uptake ~ Plant + Type + Treatment + conc, CO2, BETA=TRUE) aov3(uptake ~ Plant + Type + Treatment + conc, CO2, Resid=TRUE) aov3(uptake ~ Plant + Type + Treatment + conc, CO2, BETA=TRUE, Resid=TRUE)
The data is from 'Canner PL. An overview of six clinical trials of aspirin in coronary heart disease. Stat Med. 1987'
aspirinCHD
aspirinCHD
A data frame with 6 rows.
y1
death event count of aspirin group
n1
total subjet of aspirin group
y2
death event count of placebo group
n2
total subject of placebo group
This data is for educational purpose.
Canner PL. An overview of six clinical trials of aspirin in coronary heart disease. Stat Med. 1987;6:255-263.
Contains Cmax data from a real bioequivalence study.
BEdata
BEdata
A data frame with 91 observations on the following 6 variables.
ADM
Admission or Hospitalization Group Code: 1, 2, or 3
SEQ
Group or Sequence character code: 'RT' or 'TR"
PRD
Period numeric value: 1 or 2
TRT
Treatment or Drug code: 'R' or 'T'
SUBJ
Subject ID
CMAX
Cmax values
This contains a real data of 2x2 bioequivalence study, which has three different hospitalization groups. See Bae KS, Kang SH. Bioequivalence data analysis for the case of separate hospitalization. Transl Clin Pharmacol. 2017;25(2):93-100. doi.org/10.12793/tcp.2017.25.2.93
Trailing zeros after integer is somewhat annoying. This removes those in the vector of strings.
bk(ktab, rpltag=c("n", "N"), dig=10)
bk(ktab, rpltag=c("n", "N"), dig=10)
ktab |
an output of |
rpltag |
tag string of replacement rows. This is usually "n" which means the sample count. |
dig |
maximum digits of decimals in the |
This is convenient if used with tsum0, tsum1, tsum2, tsum3, This requires knitr::kable
.
A new processed vector of strings. The class is still knitr_kable
.
Kyun-Seop Bae [email protected]
## OUTPUT example # t0 = tsum0(CO2, "uptake", c("mean", "median", "sd", "length", "min", "max")) # bk(kable(t0)) # requires knitr package # # | | x| # |:------|--------:| # |mean | 27.21310| # |median | 28.30000| # |sd | 10.81441| # |n | 84 | # |min | 7.70000| # |max | 45.50000| # t1 = tsum(uptake ~ Treatment, CO2, # e=c("mean", "median", "sd", "min", "max", "length"), # ou=c("chilled", "nonchilled"), # repl=list(c("median", "length"), c("med", "N"))) # # bk(kable(t1, digits=3)) # requires knitr package # # | | chilled| nonchilled| Combined| # |:----|-------:|----------:|--------:| # |mean | 23.783| 30.643| 27.213| # |med | 19.700| 31.300| 28.300| # |sd | 10.884| 9.705| 10.814| # |min | 7.700| 10.600| 7.700| # |max | 42.400| 45.500| 45.500| # |N | 42 | 42 | 84 |
## OUTPUT example # t0 = tsum0(CO2, "uptake", c("mean", "median", "sd", "length", "min", "max")) # bk(kable(t0)) # requires knitr package # # | | x| # |:------|--------:| # |mean | 27.21310| # |median | 28.30000| # |sd | 10.81441| # |n | 84 | # |min | 7.70000| # |max | 45.50000| # t1 = tsum(uptake ~ Treatment, CO2, # e=c("mean", "median", "sd", "min", "max", "length"), # ou=c("chilled", "nonchilled"), # repl=list(c("median", "length"), c("med", "N"))) # # bk(kable(t1, digits=3)) # requires knitr package # # | | chilled| nonchilled| Combined| # |:----|-------:|----------:|--------:| # |mean | 23.783| 30.643| 27.213| # |med | 19.700| 31.300| 28.300| # |sd | 10.884| 9.705| 10.814| # |min | 7.700| 10.600| 7.700| # |max | 42.400| 45.500| 45.500| # |N | 42 | 42 | 84 |
GLM, REG, aov1 etc. functions can be run by levels of a variable.
BY(FUN, Formula, Data, By, ...)
BY(FUN, Formula, Data, By, ...)
FUN |
Function name to be called such as GLM, REG |
Formula |
a conventional formula for a linear model. |
Data |
a |
By |
a variable name in the |
... |
arguments to be passed to |
This mimics SAS procedues' BY clause.
a list of FUN
function outputs. The names are after each level.
Kyun-Seop Bae [email protected]
BY(GLM, uptake ~ Treatment + as.factor(conc), CO2, By="Type") BY(REG, uptake ~ conc, CO2, By="Type")
BY(GLM, uptake ~ Treatment + as.factor(conc), CO2, By="Type") BY(REG, uptake ~ conc, CO2, By="Type")
Get point estimate and its confidence interval with given contrast and alpha value using t distribution.
CIest(Formula, Data, Term, Contrast, conf.level=0.95)
CIest(Formula, Data, Term, Contrast, conf.level=0.95)
Formula |
a conventional formula for a linear model |
Data |
a |
Term |
a factor name to be estimated |
Contrast |
a level vector. Level is alphabetically ordered by default. |
conf.level |
confidence level of confidence interval |
Get point estimate and its confidence interval with given contrast and alpha value using t distribution.
Estimate |
point estimate of the input linear contrast |
Lower CL |
lower confidence limit |
Upper CL |
upper confidence limit |
Std. Error |
standard error of the point estimate |
t value |
value for t distribution |
Df |
degree of freedom |
Pr(>|t|) |
probability of larger than absolute t value from t distribution with residual's degree of freedom |
Kyun-Seop Bae [email protected]
CIest(log(CMAX) ~ SEQ/SUBJ + PRD + TRT, BEdata, "TRT", c(-1, 1), 0.90) # 90% CI
CIest(log(CMAX) ~ SEQ/SUBJ + PRD + TRT, BEdata, "TRT", c(-1, 1), 0.90) # 90% CI
Collearity diagnostics with tolerance, VIF, eigenvalue, condition index, variance proportions
Coll(Formula, Data)
Coll(Formula, Data)
Formula |
fomula of the model |
Data |
input data as a matrix or data.frame |
Sometimes collinearity diagnostics after multiple linear regression are necessary.
Tol |
tolerance of independent variables |
VIF |
variance inflation factor of independent variables |
Eigenvalue |
eigenvalue of Z'Z (crossproduct) of standardized independent variables |
Cond. Index |
condition index |
Proportions of variances |
under the names of coefficients |
Kyun-Seop Bae [email protected]
Coll(mpg ~ disp + hp + drat + wt + qsec, mtcars)
Coll(mpg ~ disp + hp + drat + wt + qsec, mtcars)
Do F test with a given set of contrasts.
CONTR(L, Formula, Data, mu=0)
CONTR(L, Formula, Data, mu=0)
L |
contrast matrix. Each row is a contrast. |
Formula |
a conventional formula for a linear model |
Data |
a |
mu |
a vector of mu for the hypothesis L. The length should be equal to the row count of L. |
It performs F test with a given set of contrasts (a matrix). It is similar to the CONTRAST clause of SAS PROC GLM. This can test the hypothesis that the linear combination (function)'s mean vector is mu.
Returns sum of square and its F value and p-value.
Df |
degree of freedom |
Sum Sq |
sum of square for the set of contrasts |
Mean Sq |
mean square |
F value |
F value for the F distribution |
Pr(>F) |
proability of larger than F value |
Kyun-Seop Bae [email protected]
CONTR(t(c(0, -1, 1)), uptake ~ Type, CO2) # sum of square GLM(uptake ~ Type, CO2) # compare with the above
CONTR(t(c(0, -1, 1)), uptake ~ Type, CO2) # sum of square GLM(uptake ~ Type, CO2) # compare with the above
Testing correlation between numeric columns of data with Pearson method.
Cor.test(Data, conf.level=0.95)
Cor.test(Data, conf.level=0.95)
Data |
a matrix or a data.frame |
conf.level |
confidence level |
It uses all numeric columns of input data. It uses "pairwise.complete.obs" rows.
Row names show which columns are used for the test
Estimate |
point estimate of correlation |
Lower CL |
upper confidence limit |
Upper CL |
lower condidence limit |
t value |
t value of the t distribution |
Df |
degree of freedom |
Pr(>|t|) |
probability with the t distribution |
Kyun-Seop Bae [email protected]
Cor.test(mtcars)
Cor.test(mtcars)
Testing correlation between two numeric vectors by Fisher's Z transformation
corFisher(x, y, conf.level=0.95, rho=0)
corFisher(x, y, conf.level=0.95, rho=0)
x |
the first input numeric vector |
y |
the second input numeric vector |
conf.level |
confidence level |
rho |
population correlation rho under null hypothesis |
This accepts only two numeric vectors.
N |
sample size, length of input vectors |
r |
sample correlation |
Fisher.z |
Fisher's z |
bias |
bias to correct |
rho.hat |
point estimate of population rho |
conf.level |
confidence level for the confidence interval |
lower |
lower limit of confidence interval |
upper |
upper limit of confidence interval |
rho0 |
population correlation rho under null hypothesis |
p.value |
p value under the null hypothesis |
Kyun-Seop Bae [email protected]
Fisher RA. Statistical Methods for Research Workers. 14e. 1973
corFisher(mtcars$disp, mtcars$hp, rho=0.6)
corFisher(mtcars$disp, mtcars$hp, rho=0.6)
Calculates sum of squares of a contrast from a lfit
result.
cSS(K, rx, mu=0, eps=1e-8)
cSS(K, rx, mu=0, eps=1e-8)
K |
contrast matrix. Each row is a contrast. |
rx |
a result of |
mu |
a vector of mu for the hypothesis K. The length should be equal to the row count of K. |
eps |
Less than this value is considered as zero. |
It calculates sum of squares with given a contrast matrix and a lfit
result. It corresponds to SAS PROC GLM CONTRAST. This can test the hypothesis that the linear combination (function)'s mean vector is mu.
Returns sum of square and its F value and p-value.
Df |
degree of freedom |
Sum Sq |
sum of square for the set of contrasts |
Mean Sq |
mean square |
F value |
F value for the F distribution |
Pr(>F) |
proability of larger than F value |
Kyun-Seop Bae [email protected]
rx = REG(uptake ~ Type, CO2, summarize=FALSE) cSS(t(c(0, -1, 1)), rx) # sum of square GLM(uptake ~ Type, CO2) # compare with the above
rx = REG(uptake ~ Type, CO2, summarize=FALSE) cSS(t(c(0, -1, 1)), rx) # sum of square GLM(uptake ~ Type, CO2) # compare with the above
Cumulative alpha values with repeated hypothesis with a fixed upper bound z-value.
CumAlpha(x, K=2, side=2)
CumAlpha(x, K=2, side=2)
x |
fixed upper z-value bound for the repeated hypothesis test |
K |
total number of tests |
side |
1=one-side test, 2=two-side test |
It calculates cumulative alpha-values for the even-interval repeated hypothesis test with a fixed upper bound z-value. It assumes linear (proportional) increase of information amount and Brownian motion of z-value, i.e. the correlation is sqrt(t_i/t_j).
The result is a matrix.
ti |
time of test, Even-interval is assumed. |
cum.alpha |
cumulative alpha valued |
Kyun-Seop Bae [email protected]
Reboussin DM, DeMets DL, Kim K, Lan KKG. Computations for group sequential boundaries using the Lan-DeMets function method. Controlled Clinical Trials. 2000;21:190-207.
CumAlpha(x=qnorm(1 - 0.05/2), K=10) # two-side Z-test with alpha=0.05 for ten times
CumAlpha(x=qnorm(1 - 0.05/2), K=10) # two-side Z-test with alpha=0.05 for ten times
Coefficient of variation in percentage.
CV(y)
CV(y)
y |
a numeric vector |
It removes NA
.
Coefficient of variation in percentage.
Kyun-Seop Bae [email protected]
CV(mtcars$mpg)
CV(mtcars$mpg)
Plot pairwise differences by a common.
Diffogram(Formula, Data, Term, conf.level=0.95, adj="lsd", ...)
Diffogram(Formula, Data, Term, conf.level=0.95, adj="lsd", ...)
Formula |
a conventional formula for a linear model |
Data |
a |
Term |
a factor name to be estimated |
conf.level |
confidence level of confidence interval |
adj |
"lsd", "tukey", "scheffe", "bon", or "duncan" to adjust p-value and confidence limit |
... |
arguments to be passed to |
This usually shows the shortest interval. It corresponds to SAS PROC GLM PDIFF. For adjust method "dunnett", see PDIFF
function.
no return value, but a plot on the current device
Kyun-Seop Bae [email protected]
Diffogram(uptake ~ Type*Treatment + as.factor(conc), CO2, "as.factor(conc)")
Diffogram(uptake ~ Type*Treatment + as.factor(conc), CO2, "as.factor(conc)")
Calculate the drift value with given upper bounds (z-valuse), times of test, and power.
Drift(bi, ti=NULL, Power=0.9)
Drift(bi, ti=NULL, Power=0.9)
bi |
upper bound z-values |
ti |
times of test. These should be in the range of [0, 1]. If omitted, even-interval is assumed. |
Power |
target power at the final test |
It calculates the drift value with given upper bound z-values, times of test, and power. If the times of test is not given, even-interval is assumed. mvtnorm::pmvt
(with noncentrality) is better than pmvnorm in calculating power and sample size. But, Lan-DeMets used multi-variate normal rather than multi-variate noncentral t distributionh. This function followed Lan-DeMets for the consistency with previous results.
Drift value for the given condition
Kyun-Seop Bae [email protected]
Reboussin DM, DeMets DL, Kim K, Lan KKG. Computations for group sequential boundaries using the Lan-DeMets function method. Controlled Clinical Trials. 2000;21:190-207.
Drift(seqBound(ti=(1:5)/5)[, "up.bound"])
Drift(seqBound(ti=(1:5)/5)[, "up.bound"])
Makes a contrast matrix for type I SS using forward Doolittle method.
e1(XpX, eps=1e-8)
e1(XpX, eps=1e-8)
XpX |
crossprodut of a design or model matrix. This should have appropriate column names. |
eps |
Less than this value is considered as zero. |
It makes a contrast matrix for type I SS. If zapsmall is used, the result becomes more inaccurate.
A contrast matrix for type I SS.
Kyun-Seop Bae [email protected]
x = ModelMatrix(uptake ~ Plant + Type + Treatment + conc, CO2) round(e1(crossprod(x$X)), 12)
x = ModelMatrix(uptake ~ Plant + Type + Treatment + conc, CO2) round(e1(crossprod(x$X)), 12)
Makes a contrast matrix for type II SS.
e2(x, eps=1e-8)
e2(x, eps=1e-8)
x |
an output of ModelMatrix |
eps |
Less than this value is considered as zero. |
It makes a contrast matrix for type II SS. If zapsmall is used, the result becomes more inaccurate.
A contrast matrix for type II SS.
Kyun-Seop Bae [email protected]
round(e2(ModelMatrix(uptake ~ Plant + Type + Treatment + conc, CO2)), 12) round(e2(ModelMatrix(uptake ~ Type, CO2)), 12) round(e2(ModelMatrix(uptake ~ Type - 1, CO2)), 12)
round(e2(ModelMatrix(uptake ~ Plant + Type + Treatment + conc, CO2)), 12) round(e2(ModelMatrix(uptake ~ Type, CO2)), 12) round(e2(ModelMatrix(uptake ~ Type - 1, CO2)), 12)
Makes a contrast matrix for type III SS.
e3(x, eps=1e-8)
e3(x, eps=1e-8)
x |
an output of ModelMatrix |
eps |
Less than this value is considered as zero. |
It makes a contrast matrix for type III SS. If zapsmall is used, the result becomes more inaccurate.
A contrast matrix for type III SS.
Kyun-Seop Bae [email protected]
round(e3(ModelMatrix(uptake ~ Plant + Type + Treatment + conc, CO2)), 12)
round(e3(ModelMatrix(uptake ~ Plant + Type + Treatment + conc, CO2)), 12)
Calculates a formula table for expected mean square of the given contrast. The default is for Type III SS.
EMS(Formula, Data, Type=3, eps=1e-8)
EMS(Formula, Data, Type=3, eps=1e-8)
Formula |
a conventional formula for a linear model |
Data |
a |
Type |
type of sum of squares. The default is 3. Type 4 is not supported yet. |
eps |
Less than this value is considered as zero. |
This is necessary for further hypothesis tests of nesting factors.
A coefficient matrix for Type III expected mean square
Kyun-Seop Bae [email protected]
f1 = log(CMAX) ~ SEQ/SUBJ + PRD + TRT EMS(f1, BEdata) EMS(f1, BEdata, Type=1) EMS(f1, BEdata, Type=2)
f1 = log(CMAX) ~ SEQ/SUBJ + PRD + TRT EMS(f1, BEdata) EMS(f1, BEdata, Type=1) EMS(f1, BEdata, Type=2)
Estimates Linear Functions with a given GLM result.
est(L, X, rx, conf.level=0.95, adj="lsd", paired=FALSE)
est(L, X, rx, conf.level=0.95, adj="lsd", paired=FALSE)
L |
a matrix of linear contrast rows to be tested |
X |
a model (design) matrix from |
rx |
a result of |
conf.level |
confidence level of confidence limit |
adj |
adjustment method for grouping. This supports "tukey", "bon", "scheffe", "duncan", and "dunnett". This only affects grouping, not the confidence interval. |
paired |
If this is TRUE, L matrix is for the pairwise comparison such as PDIFF function. |
It tests rows of linear function. Linear function means linear combination of estimated coefficients. It corresponds to SAS PROC GLM ESTIMATE. Same sample size per group is assumed for the Tukey adjustment.
Estimate |
point estimate of the input linear contrast |
Lower CL |
lower confidence limit by "lsd" method |
Upper CL |
upper confidence limit by "lsd" method |
Std. Error |
standard error of the point estimate |
t value |
value for t distribution for other than "scheffe" method |
F value |
value for F distribution for "scheffe" method only |
Df |
degree of freedom of residuals |
Pr(>|t|) |
probability of larger than absolute t value from t distribution with residual's degree of freedom, for other than "scheffe" method |
Pr(>F) |
probability of larger than F value from F distribution with residual's degree of freedom, for "scheffe" method only |
Kyun-Seop Bae [email protected]
x = ModelMatrix(uptake ~ Type, CO2) rx = REG(uptake ~ Type, CO2, summarize=FALSE) est(t(c(0, -1, 1)), x$X, rx) # Quebec - Mississippi t.test(uptake ~ Type, CO2) # compare with the above
x = ModelMatrix(uptake ~ Type, CO2) rx = REG(uptake ~ Type, CO2, summarize=FALSE) est(t(c(0, -1, 1)), x$X, rx) # Quebec - Mississippi t.test(uptake ~ Type, CO2) # compare with the above
Estimates Linear Function with a formula and a dataset.
ESTM(L, Formula, Data, conf.level=0.95)
ESTM(L, Formula, Data, conf.level=0.95)
L |
a matrix of linear functions rows to be tested |
Formula |
a conventional formula for a linear model |
Data |
a |
conf.level |
confidence level of confidence limit |
It tests rows of linear functions. Linear function means linear combination of estimated coefficients. It is similar to SAS PROC GLM ESTIMATE. This is a convenient version of est
function.
Estimate |
point estimate of the input linear contrast |
Lower CL |
lower confidence limit |
Upper CL |
upper confidence limit |
Std. Error |
standard error of the point estimate |
t value |
value for t distribution |
Df |
degree of freedom |
Pr(>|t|) |
probability of larger than absolute t value from t distribution with residual's degree of freedom |
Kyun-Seop Bae [email protected]
ESTM(t(c(0, -1, 1)), uptake ~ Type, CO2) # Quevec - Mississippi
ESTM(t(c(0, -1, 1)), uptake ~ Type, CO2) # Quevec - Mississippi
Check the estimability of row vectors of coefficients.
estmb(L, X, g2, eps=1e-8)
estmb(L, X, g2, eps=1e-8)
L |
row vectors of coefficients |
X |
a model (design) matrix from |
g2 |
g2 generalized inverse of |
eps |
absolute value less than this is considered to be zero. |
It checks the estimability of L, row vectors of coefficients. This corresponds to SAS PROC GLM ESTIMATE. See <Kennedy Jr. WJ, Gentle JE. Statistical Computing. 1980> p361 or <Golub GH, Styan GP. Numerical Computations for Univariate Linear Models. 1971>.
a vector of logical values indicating which row is estimable (as TRUE)
Kyun-Seop Bae [email protected]
Exit probabilities with given drift, upper bounds, and times of test.
ExitP(Theta, bi, ti=NULL)
ExitP(Theta, bi, ti=NULL)
Theta |
drift value defined by Lan-DeMets. See the reference. |
bi |
upper bound z-values |
ti |
times of test. These should be in the range of [0, 1]. If omitted, even-interval is assumed. |
It calculates exit proabilities and cumulative exit probabilities with given drift, upper z-bounds and times of test. If the times of test is not given, even-interval is assumed. mvtnorm::pmvt
(with noncentrality) is better than pmvnorm in calculating power and sample size. But, Lan-DeMets used multi-variate normal rather than multi-variate noncentral t distributionh. This function followed Lan-DeMets for the consistency with previous results.
The result is a matrix.
ti |
time of test |
bi |
upper z-bound |
cum.alpha |
cumulative alpha-value |
Kyun-Seop Bae [email protected]
Reboussin DM, DeMets DL, Kim K, Lan KKG. Computations for group sequential boundaries using the Lan-DeMets function method. Controlled Clinical Trials. 2000;21:190-207.
b0 = seqBound(ti=(1:5)/5)[, "up.bound"] ExitP(Theta = Drift(b0), bi = b0)
b0 = seqBound(ti=(1:5)/5)[, "up.bound"] ExitP(Theta = Drift(b0), bi = b0)
Generalized inverse is usually not unique. Some programs use this algorithm to get a unique generalized inverse matrix. This uses SWEEP operator and works for non-square matrix also.
g2inv(A, eps=1e-08)
g2inv(A, eps=1e-08)
A |
a matrix to be inverted |
eps |
Less than this value is considered as zero. |
See 'SAS Technical Report R106, The Sweep Operator: Its importance in Statistical Computing' by J. H. Goodnight for the detail.
g2 inverse
Kyun-Seop Bae [email protected]
Searle SR, Khuri AI. Matrix Algebra Useful for Statistics. 2e. John Wiley and Sons Inc. 2017.
A = matrix(c(1, 2, 4, 3, 3, -1, 2, -2, 5, -4, 0, -7), byrow=TRUE, ncol=4) ; A g2inv(A)
A = matrix(c(1, 2, 4, 3, 3, -1, 2, -2, 5, -4, 0, -7), byrow=TRUE, ncol=4) ; A g2inv(A)
Generalized inverse is usually not unique. Some programs use this algorithm to get a unique generalized inverse matrix.
G2SWEEP(A, Augmented=FALSE, eps=1e-08)
G2SWEEP(A, Augmented=FALSE, eps=1e-08)
A |
a matrix to be inverted. If |
Augmented |
If this is |
eps |
Less than this value is considered as zero. |
Generalized inverse of g2-type is used by some softwares to do linear regression. See 'SAS Technical Report R106, The Sweep Operator: Its importance in Statistical Computing' by J. H. Goodnight for the detail.
when Augmented=FALSE |
ordinary g2 inverse |
when Augmented=TRUE |
g2 inverse and beta hats in the last column and the last row, and sum of square error (SSE) in the last cell |
attribute "rank" |
the rank of input matrix |
Kyun-Seop Bae [email protected]
f1 = uptake ~ Type + Treatment # formula x = ModelMatrix(f1, CO2) # Model matrix and relevant information y = model.frame(f1, CO2)[, 1] # observation vector nc = ncol(x$X) # number of columns of model matrix XpY = crossprod(x$X, y) aXpX = rbind(cbind(crossprod(x$X), XpY), cbind(t(XpY), crossprod(y))) ag2 = G2SWEEP(aXpX, Augmented=TRUE) b = ag2[1:nc, (nc + 1)] ; b # Beta hat iXpX = ag2[1:nc, 1:nc] ; iXpX # g2 inverse of X'X SSE = ag2[(nc + 1), (nc + 1)] ; SSE # Sum of Square Error DFr = nrow(x$X) - attr(ag2, "rank") ; DFr # Degree of freedom for the residual # Compare the below with the above REG(f1, CO2) aov1(f1, CO2)
f1 = uptake ~ Type + Treatment # formula x = ModelMatrix(f1, CO2) # Model matrix and relevant information y = model.frame(f1, CO2)[, 1] # observation vector nc = ncol(x$X) # number of columns of model matrix XpY = crossprod(x$X, y) aXpX = rbind(cbind(crossprod(x$X), XpY), cbind(t(XpY), crossprod(y))) ag2 = G2SWEEP(aXpX, Augmented=TRUE) b = ag2[1:nc, (nc + 1)] ; b # Beta hat iXpX = ag2[1:nc, 1:nc] ; iXpX # g2 inverse of X'X SSE = ag2[(nc + 1), (nc + 1)] ; SSE # Sum of Square Error DFr = nrow(x$X) - attr(ag2, "rank") ; DFr # Degree of freedom for the residual # Compare the below with the above REG(f1, CO2) aov1(f1, CO2)
Geometric coefficient of variation in percentage.
geoCV(y)
geoCV(y)
y |
a numeric vector |
It removes NA
. This is sqrt(exp(var(log(x))) - 1)*100.
Geometric coefficient of variation in percentage.
Kyun-Seop Bae [email protected]
geoCV(mtcars$mpg)
geoCV(mtcars$mpg)
mean without NA
values.
geoMean(y)
geoMean(y)
y |
a vector of numerics |
It removes NA
in the input vector.
geometric mean value
Kyun-Seop Bae [email protected]
geoMean(mtcars$mpg)
geoMean(mtcars$mpg)
GLM is the main function of this package.
GLM(Formula, Data, BETA=FALSE, EMEAN=FALSE, Resid=FALSE, conf.level=0.95, Weights=1)
GLM(Formula, Data, BETA=FALSE, EMEAN=FALSE, Resid=FALSE, conf.level=0.95, Weights=1)
Formula |
a conventional formula for a linear model. |
Data |
a |
BETA |
if |
EMEAN |
if |
Resid |
if |
conf.level |
confidence level for the confidence limit of the least square mean |
Weights |
weights for the weighted least square |
It performs the core function of SAS PROC GLM. Least square means for the interaction term of three variables is not supported yet.
The result is comparable to that of SAS PROC GLM.
ANOVA |
ANOVA table for the model |
Fitness |
Some measures of goodness of fit such as R-square and CV |
Type I |
Type I sum of square table |
Type II |
Type II sum of square table |
Type III |
Type III sum of square table |
Parameter |
Parameter table with standard error, t value, p value. |
Expected Mean |
Least square (or expected) mean table with confidence limit. This is returned only with EMEAN=TRUE option. |
Fitted |
Fitted value or y hat. This is returned only with Resid=TRUE option. |
Residual |
Weigthed residuals. This is returned only with Resid=TRUE option. |
Kyun-Seop Bae [email protected]
GLM(uptake ~ Type*Treatment + conc, CO2[-1,]) # Making data unbalanced GLM(uptake ~ Type*Treatment + conc, CO2[-1,], BETA=TRUE) GLM(uptake ~ Type*Treatment + conc, CO2[-1,], EMEAN=TRUE) GLM(uptake ~ Type*Treatment + conc, CO2[-1,], Resid=TRUE) GLM(uptake ~ Type*Treatment + conc, CO2[-1,], BETA=TRUE, EMEAN=TRUE) GLM(uptake ~ Type*Treatment + conc, CO2[-1,], BETA=TRUE, EMEAN=TRUE, Resid=TRUE)
GLM(uptake ~ Type*Treatment + conc, CO2[-1,]) # Making data unbalanced GLM(uptake ~ Type*Treatment + conc, CO2[-1,], BETA=TRUE) GLM(uptake ~ Type*Treatment + conc, CO2[-1,], EMEAN=TRUE) GLM(uptake ~ Type*Treatment + conc, CO2[-1,], Resid=TRUE) GLM(uptake ~ Type*Treatment + conc, CO2[-1,], BETA=TRUE, EMEAN=TRUE) GLM(uptake ~ Type*Treatment + conc, CO2[-1,], BETA=TRUE, EMEAN=TRUE, Resid=TRUE)
Testing if the input matrix is a correlation matrix or not
is.cor(m, eps=1e-16)
is.cor(m, eps=1e-16)
m |
a presumed correlation matrix |
eps |
epsilon value. An absolute value less than this is considered as zero. |
A diagonal component should not be necessarily 1. But it should be close to 1.
TRUE or FALSE
Kyun-Seop Bae [email protected]
Kurtosis with a conventional formula.
Kurtosis(y)
Kurtosis(y)
y |
a vector of numerics |
It removes NA
in the input vector.
Estimate of kurtosis
Kyun-Seop Bae [email protected]
Standard error of the estimated kurtosis with a conventional formula.
KurtosisSE(y)
KurtosisSE(y)
y |
a vector of numerics |
It removes NA
in the input vector.
Standard error of the estimated kurtosis
Kyun-Seop Bae [email protected]
The estimate of the lower bound of confidence limit using t-distribution
LCL(y, conf.level=0.95)
LCL(y, conf.level=0.95)
y |
a vector of numerics |
conf.level |
confidence level |
It removes NA
in the input vector.
The estimate of the lower bound of confidence limit using t-distribution
Kyun-Seop Bae [email protected]
Fits a least square linear model.
lfit(x, y, eps=1e-8)
lfit(x, y, eps=1e-8)
x |
a result of ModelMatrix |
y |
a column vector of response, dependent variable |
eps |
Less than this value is considered as zero. |
Minimum version of least square fit of a linear model
coeffcients |
beta coefficients |
g2 |
g2 inverse |
rank |
rank of the model matrix |
DFr |
degree of freedom for the residual |
SSE |
sum of squares error |
SST |
sum of squares total |
DFr2 |
degree of freedom of the residual for beta coefficient |
Kyun-Seop Bae [email protected]
f1 = uptake ~ Type*Treatment + conc x = ModelMatrix(f1, CO2) y = model.frame(f1, CO2)[,1] lfit(x, y)
f1 = uptake ~ Type*Treatment + conc x = ModelMatrix(f1, CO2) y = model.frame(f1, CO2)[,1] lfit(x, y)
Coefficients calculated with g2 inverse. Output is similar to summary(lm())
.
lr(Formula, Data, eps=1e-8)
lr(Formula, Data, eps=1e-8)
Formula |
a conventional formula for a linear model |
Data |
a |
eps |
Less than this value is considered as zero. |
It uses G2SWEEP to get g2 inverse. The result is similar to summary(lm())
without options.
The result is comparable to that of SAS PROC REG.
Estimate |
point estimate of parameters, coefficients |
Std. Error |
standard error of the point estimate |
t value |
value for t distribution |
Pr(>|t|) |
probability of larger than absolute t value from t distribution with residual's degree of freedom |
Kyun-Seop Bae [email protected]
lr(uptake ~ Plant + Type + Treatment + conc, CO2) lr(uptake ~ Plant + Type + Treatment + conc - 1, CO2) lr(uptake ~ Type, CO2) lr(uptake ~ Type - 1, CO2)
lr(uptake ~ Plant + Type + Treatment + conc, CO2) lr(uptake ~ Plant + Type + Treatment + conc - 1, CO2) lr(uptake ~ Type, CO2) lr(uptake ~ Type - 1, CO2)
Usually, the first step to multiple linear regression is simple linear regressions with a single independent variable.
lr0(Formula, Data)
lr0(Formula, Data)
Formula |
a conventional formula for a linear model. Intercept will always be added. |
Data |
a |
It performs simple linear regression for each independent variable.
Each row means one simple linear regression with that row name as the only independent variable.
Intercept |
estimate of the intecept |
SE(Intercept) |
standard error of the intercept |
Slope |
estimate of the slope |
SE(Slope) |
standard error of the slope |
Rsq |
R-squared for the simple linear model |
Pr(>F) |
p-value of slope or the model |
Kyun-Seop Bae [email protected]
lr0(uptake ~ Plant + Type + Treatment + conc, CO2) lr0(mpg ~ ., mtcars)
lr0(uptake ~ Plant + Type + Treatment + conc, CO2) lr0(mpg ~ ., mtcars)
Estimates least square means using g2 inverse.
LSM(Formula, Data, Term, conf.level=0.95, adj="lsd", hideNonEst=TRUE, PLOT=FALSE, descend=FALSE, ...)
LSM(Formula, Data, Term, conf.level=0.95, adj="lsd", hideNonEst=TRUE, PLOT=FALSE, descend=FALSE, ...)
Formula |
a conventional formula of model |
Data |
data.frame |
Term |
term name to be returned. If there is only one independent variable, this can be omitted. |
conf.level |
confidence level for the confidence limit |
adj |
adjustment method for grouping, "lsd"(default), "tukey", "bon", "duncan", "scheffe" are available. This does not affects SE, Lower CL, Upper CL of the output table. |
hideNonEst |
logical. hide non-estimables |
PLOT |
logical. whether to plot LSMs and their confidence intervals |
descend |
logical. This specifies the plotting order be ascending or descending. |
... |
arguments to be passed to |
It corresponds to SAS PROC GLM LSMEANS. The result of the second example below may be different from emmeans
. This is because SAS or this function calculates mean of the transformed continuous variable. However, emmeans
calculates the average before the transformation. Interaction of three variables is not supported yet. For adjust method "dunnett", see PDIFF
function.
Returns a table of expectations, t values and p-values.
Group |
group character. This appears with one-way ANOVA or |
LSmean |
point estimate of least square mean |
LowerCL |
lower confidence limit with the given confidence level by "lsd" method |
UpperCL |
upper confidence limit with the given confidence level by "lsd" method |
SE |
standard error of the point estimate |
Df |
degree of freedom of point estimate |
Kyun-Seop Bae [email protected]
LSM(uptake ~ Type, CO2[-1,]) LSM(uptake ~ Type - 1, CO2[-1,]) LSM(uptake ~ Type*Treatment + conc, CO2[-1,]) LSM(uptake ~ Type*Treatment + conc - 1, CO2[-1,]) LSM(log(uptake) ~ Type*Treatment + log(conc), CO2[-1,]) LSM(log(uptake) ~ Type*Treatment + log(conc) - 1, CO2[-1,]) LSM(log(uptake) ~ Type*Treatment + as.factor(conc), CO2[-1,]) LSM(log(uptake) ~ Type*Treatment + as.factor(conc) - 1, CO2[-1,]) LSM(log(CMAX) ~ SEQ/SUBJ + PRD + TRT, BEdata) LSM(log(CMAX) ~ SEQ/SUBJ + PRD + TRT - 1, BEdata)
LSM(uptake ~ Type, CO2[-1,]) LSM(uptake ~ Type - 1, CO2[-1,]) LSM(uptake ~ Type*Treatment + conc, CO2[-1,]) LSM(uptake ~ Type*Treatment + conc - 1, CO2[-1,]) LSM(log(uptake) ~ Type*Treatment + log(conc), CO2[-1,]) LSM(log(uptake) ~ Type*Treatment + log(conc) - 1, CO2[-1,]) LSM(log(uptake) ~ Type*Treatment + as.factor(conc), CO2[-1,]) LSM(log(uptake) ~ Type*Treatment + as.factor(conc) - 1, CO2[-1,]) LSM(log(CMAX) ~ SEQ/SUBJ + PRD + TRT, BEdata) LSM(log(CMAX) ~ SEQ/SUBJ + PRD + TRT - 1, BEdata)
maximum without NA
values.
Max(y)
Max(y)
y |
a vector of numerics |
It removes NA
in the input vector.
maximum value
Kyun-Seop Bae [email protected]
mean without NA
values.
Mean(y)
Mean(y)
y |
a vector of numerics |
It removes NA
in the input vector.
mean value
Kyun-Seop Bae [email protected]
median without NA
values.
Median(y)
Median(y)
y |
a vector of numerics |
It removes NA
in the input vector.
median value
Kyun-Seop Bae [email protected]
minimum without NA
values.
Min(y)
Min(y)
y |
a vector of numerics |
It removes NA
in the input vector.
minimum value
Kyun-Seop Bae [email protected]
This model matrix is similar to model.matrix
. But it does not omit unnecessary columns.
ModelMatrix(Formula, Data, KeepOrder=FALSE, XpX=FALSE)
ModelMatrix(Formula, Data, KeepOrder=FALSE, XpX=FALSE)
Formula |
a conventional formula for a linear model |
Data |
a |
KeepOrder |
If |
XpX |
If |
It makes the model(design) matrix for GLM
.
Model matrix and attributes similar to the output of model.matrix
.
X |
design matrix, i.e. model matrix |
XpX |
cross-product of the design matrix, X'X |
terms |
detailed information about terms such as formula and labels |
termsIndices |
term indices |
assign |
assignemnt of columns for each term in order, different way of expressing term indices |
Kyun-Seop Bae [email protected]
This is comparable to SAS PROC TTEST except using summarized input (sufficient statistics).
mtest(m1, s1, n1, m0, s0, n0, conf.level=0.95)
mtest(m1, s1, n1, m0, s0, n0, conf.level=0.95)
m1 |
mean of the first (test, active, experimental) group |
s1 |
sample standard deviation of the first group |
n1 |
sample size of the first group |
m0 |
mean of the second (reference, control, placebo) group |
s0 |
sample standard deviation of the second group |
n0 |
sample size of the second group |
conf.level |
confidence level |
This uses summarized input. This also produces confidence intervals of means and variances by group.
The output format is comparable to SAS PROC TTEST.
Kyun-Seop Bae [email protected]
mtest(5.4, 10.5, 3529, 5.1, 8.9, 5190) # NEJM 388;15 p1386
mtest(5.4, 10.5, 3529, 5.1, 8.9, 5190) # NEJM 388;15 p1386
Number of observations excluding NA
values
N(y)
N(y)
y |
a vector of numerics |
It removes NA
in the input vector.
Count of the observation
Kyun-Seop Bae [email protected]
Odds Ratio between two groups
OR(y1, n1, y2, n2, conf.level=0.95)
OR(y1, n1, y2, n2, conf.level=0.95)
y1 |
positive event count of test (the first) group |
n1 |
total count of the test (the first) group |
y2 |
positive event count of control (the second) group |
n2 |
total count of control (the second) group |
conf.level |
confidence level |
It calculates odds ratio of two groups. No continuity correction here. If you need percent scale, multiply the output by 100.
The result is a data.frame.
odd1 |
proportion from the first group |
odd2 |
proportion from the second group |
OR |
odds ratio, odd1/odd2 |
SElog |
standard error of log(OR) |
lower |
lower confidence limit of OR |
upper |
upper confidence limit of OR |
Kyun-Seop Bae [email protected]
RD
, RR
, RDmn1
, RRmn1
, ORmn1
, RDmn
, RRmn
, ORmn
OR(104, 11037, 189, 11034) # no continuity correction
OR(104, 11037, 189, 11034) # no continuity correction
Odds ratio and its score confidence interval of two groups with stratification by Cochran-Mantel-Haenszel method
ORcmh(d0, conf.level=0.95)
ORcmh(d0, conf.level=0.95)
d0 |
A data.frame or matrix, of which each row means a strata. This should have four columns named y1, n1, y2, and n2; y1 and y2 for events of each group, n1 and n2 for sample size of each strata. The second group is usually the control group. |
conf.level |
confidence level |
It calculates odds ratio and its score confidence interval of two groups. This can be used for meta-analysis also.
The following output will be returned for each stratum and common value. There is no standard error.
odd1 |
odd from the first group, y1/(n1 - y1) |
odd2 |
odd from the second group, y2/(n2 - y2) |
OR |
odds ratio, odd1/odd2. The point estimate of common OR is calculated with MH weight. |
lower |
lower confidence limit of OR |
upper |
upper confidence limit of OR |
Kyun-Seop Bae [email protected]
RDmn1
, RRmn1
, ORmn1
, RDmn
, RRmn
, ORmn
, RDinv
, RRinv
, ORinv
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") ORcmh(d1)
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") ORcmh(d1)
Odds ratio and its score confidence interval of two groups with stratification by inverse variance method
ORinv(d0, conf.level=0.95)
ORinv(d0, conf.level=0.95)
d0 |
A data.frame or matrix, of which each row means a stratum. This should have four columns named y1, n1, y2, and n2; y1 and y2 for events of each group, n1 and n2 for sample size of each strata. The second group is usually the control group. |
conf.level |
confidence level |
It calculates odds ratio and its confidence interval of two groups by inverse variance method. This supports stratification. This can be used for meta-analysis also.
The following output will be returned for each stratum and common value. There is no standard error.
odd1 |
odd from the first group, y1/(n1 - y1) |
odd2 |
odd from the second group, y2/(n2 - y2) |
OR |
odds ratio, odd1/odd2. The point estimate of common OR is calculated with MH weight. |
lower |
lower confidence limit of OR |
upper |
upper confidence limit of OR |
Kyun-Seop Bae [email protected]
RDmn1
, RRmn1
, ORmn1
, RDmn
, RRmn
, ORmn
, RDinv
, RRinv
, ORcmh
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") ORinv(d1)
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") ORinv(d1)
Odds ratio and its score confidence interval of two groups with stratification by the Miettinen and Nurminen method
ORmn(d0, conf.level=0.95, eps=1e-8)
ORmn(d0, conf.level=0.95, eps=1e-8)
d0 |
A data.frame or matrix, of which each row means a strata. This should have four columns named y1, n1, y2, and n2; y1 and y2 for events of each group, n1 and n2 for sample size of each strata. The second group is usually the control group. |
conf.level |
confidence level |
eps |
absolute value less than eps is regarded as negligible |
It calculates odds ratio and its score confidence interval of the two groups. The confidence interval is asymmetric, and there is no standard error in the output. This supports stratification. This implementation uses uniroot function, which usually gives at least 5 significant digits. Whereas PropCIs::orscoreci function uses incremental or decremental search by the factor of 1.001 which gives only about 3 significant digits. This can be used for meta-analysis also.
The following output will be returned for each stratum and common value. There is no standard error.
odd1 |
odd from the first group, y1/(n1 - y1) |
odd2 |
odd from the second group, y2/(n2 - y2) |
OR |
odds ratio, odd1/odd2. The point estimate of common OR is calculated with MN weight. |
lower |
lower confidence limit of OR |
upper |
upper confidence limit of OR |
Kyun-Seop Bae [email protected]
Miettinen O, Nurminen M. Comparative analysis of two rates. Stat Med 1985;4:213-26
RDmn1
, RRmn1
, ORmn1
, RDmn
, RRmn
, RDinv
, RRinv
, ORinv
, ORcmh
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") ORmn(d1)
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") ORmn(d1)
Odds ratio and its score confidence interval of two groups without stratification
ORmn1(y1, n1, y2, n2, conf.level=0.95, eps=1e-8)
ORmn1(y1, n1, y2, n2, conf.level=0.95, eps=1e-8)
y1 |
positive event count of test (the first) group |
n1 |
total count of the test (the first) group |
y2 |
positive event count of control (the second) group |
n2 |
total count of control (the second) group |
conf.level |
confidence level |
eps |
absolute value less than eps is regarded as negligible |
It calculates odds ratio and its score confidence interval of the two groups. The confidence interval is asymmetric, and there is no standard error in the output. This does not support stratification. This implementation uses uniroot function, which usually gives at least 5 significant digits. Whereas PropCIs::orscoreci function uses incremental or decremental search by the factor of 1.001 which gives only less than 3 significant digits.
There is no standard error.
odd1 |
odd from the first group, y1/(n1 - y1) |
odd2 |
odd from the second group, y2/(n2 - y2) |
OR |
odds ratio, odd1/odd2 |
lower |
lower confidence limit of OR |
upper |
upper confidence limit of OR |
Kyun-Seop Bae [email protected]
Miettinen O, Nurminen M. Comparative analysis of two rates. Stat Med 1985;4:213-26
RDmn1
, RRmn1
, RDmn
, RRmn
, ORmn
ORmn1(104, 11037, 189, 11034)
ORmn1(104, 11037, 189, 11034)
It plots bands of the confidence interval and prediction interval for simple linear regression.
pB(Formula, Data, Resol=300, conf.level=0.95, lx, ly, ...)
pB(Formula, Data, Resol=300, conf.level=0.95, lx, ly, ...)
Formula |
a formula |
Data |
a data.frame |
Resol |
resolution for the output |
conf.level |
confidence level |
lx |
x position of legend |
ly |
y position of legend |
... |
arguments to be passed to |
It plots. Discard return values. If lx
or ly
is missing, the legend position is calculated automatically.
Ignore return values.
Kyun-Seop Bae [email protected]
pB(hp ~ disp, mtcars) pB(mpg ~ disp, mtcars)
pB(hp ~ disp, mtcars) pB(mpg ~ disp, mtcars)
Testing partial correlation between many columns of data with Pearson method.
Pcor.test(Data, x, y)
Pcor.test(Data, x, y)
Data |
a numeric matrix or data.frame |
x |
names of columns to be tested |
y |
names of control columns |
It performs multiple partial correlation test. It uses "complete.obs" rows of x and y columns.
Row names show which columns are used for the test
Estimate |
point estimate of correlation |
Df |
degree of freedom |
t value |
t value of the t distribution |
Pr(>|t|) |
probability with the t distribution |
Kyun-Seop Bae [email protected]
Pcor.test(mtcars, c("mpg", "hp", "qsec"), c("drat", "wt"))
Pcor.test(mtcars, c("mpg", "hp", "qsec"), c("drat", "wt"))
Four standard diagnostic plots for regression.
pD(rx, Title=NULL)
pD(rx, Title=NULL)
rx |
a result of lm, which can give |
Title |
title to be printed on the plot |
Most frequently used diagnostic plots are 'observed vs. fitted', 'standardized residual vs. fitted', 'distribution plot of standard residuals', and 'Q-Q plot of standardized residuals'.
Four diagnostic plots in a page.
Kyun-Seop Bae [email protected]
pD(lm(uptake ~ Plant + Type + Treatment + conc, CO2), "Diagnostic Plot")
pD(lm(uptake ~ Plant + Type + Treatment + conc, CO2), "Diagnostic Plot")
Estimates pairwise differences by a common method.
PDIFF(Formula, Data, Term, conf.level=0.95, adj="lsd", ref, PLOT=FALSE, reverse=FALSE, ...)
PDIFF(Formula, Data, Term, conf.level=0.95, adj="lsd", ref, PLOT=FALSE, reverse=FALSE, ...)
Formula |
a conventional formula for a linear model |
Data |
a |
Term |
a factor name to be estimated |
conf.level |
confidence level of confidence interval |
adj |
"lsd", "tukey", "scheffe", "bon", "duncan", or "dunnett" to adjust p-value and confidence limit |
ref |
reference or control level for Dunnett test |
PLOT |
whether to plot or not the diffogram |
reverse |
reverse A - B to B - A |
... |
arguments to be passed to |
It corresponds to PDIFF option of SAS PROC GLM.
Returns a table of expectations, t values and p-values. Output columns may vary according to the adjustment option.
Estimate |
point estimate of the input linear contrast |
Lower CL |
lower confidence limit |
Upper CL |
upper confidence limit |
Std. Error |
standard error of the point estimate |
t value |
value for t distribution |
Df |
degree of freedom |
Pr(>|t|) |
probability of larger than absolute t value from t distribution with residual's degree of freedom |
Kyun-Seop Bae [email protected]
PDIFF(uptake ~ Type*Treatment + as.factor(conc), CO2, "as.factor(conc)") PDIFF(uptake ~ Type*Treatment + as.factor(conc), CO2, "as.factor(conc)", adj="tukey")
PDIFF(uptake ~ Type*Treatment + as.factor(conc), CO2, "as.factor(conc)") PDIFF(uptake ~ Type*Treatment + as.factor(conc), CO2, "as.factor(conc)", adj="tukey")
Cumulative alpha values with cumulative hypothesis test with a fixed upper bound z-value in group sequential design.
PocockBound(K=2, alpha=0.05, side=2)
PocockBound(K=2, alpha=0.05, side=2)
K |
total number of tests |
alpha |
alpha value at the final test |
side |
1=one-side test, 2=two-side test |
Pocock suggested a fixed upper bound z-value for the cumulative hypothesis test in group sequential designs.
a fixed upper bound z-value for the K times repated hypothesis test with a final alpha-value. Attributes are;
ti |
time of test, Even-interval is assumed. |
cum.alpha |
cumulative alpha valued |
Kyun-Seop Bae [email protected]
Reboussin DM, DeMets DL, Kim K, Lan KKG. Computations for group sequential boundaries using the Lan-DeMets function method. Controlled Clinical Trials. 2000;21:190-207.
PocockBound(K=2) # Z-value of upper bound for the two-stage design
PocockBound(K=2) # Z-value of upper bound for the two-stage design
Nine residual diagnostics plots.
pResD(rx, Title=NULL)
pResD(rx, Title=NULL)
rx |
a result of lm, which can give |
Title |
title to be printed on the plot |
SAS-style residual diagnostic plots.
Nine residual diagnostic plots in a page.
Kyun-Seop Bae [email protected]
pResD(lm(uptake ~ Plant + Type + Treatment + conc, CO2), "Residual Diagnostic Plot")
pResD(lm(uptake ~ Plant + Type + Treatment + conc, CO2), "Residual Diagnostic Plot")
Interquartile range (Q3 - Q1) with a conventional formula.
QuartileRange(y, Type=2)
QuartileRange(y, Type=2)
y |
a vector of numerics |
Type |
a type specifier to be passed to |
It removes NA
in the input vector. Type 2 is SAS default, while Type 6 is SPSS default.
The value of an interquartile range
Kyun-Seop Bae [email protected]
The range, maximum - minimum, as a scalar value.
Range(y)
Range(y)
y |
a vector of numerics |
It removes NA
in the input vector.
A scalar value of a range
Kyun-Seop Bae [email protected]
Hypothesis test of with specified type SS using random effects as error terms. This corresponds to SAS PROC GLM's RANDOM /TEST clause.
RanTest(Formula, Data, Random="", Type=3, eps=1e-8)
RanTest(Formula, Data, Random="", Type=3, eps=1e-8)
Formula |
a conventional formula for a linear model |
Data |
a |
Random |
a vector of random effects. All should be specified as primary terms, not as interaction terms. All interaction terms with random factor are regarded as random effects. |
Type |
Sum of square type to be used as contrast |
eps |
Less than this value is considered as zero. |
Type can be from 1 to 3. All interaction terms with random factor are regarded as random effects. Here the error term should not be MSE.
Returns ANOVA and E(MS) tables with specified type SS.
Kyun-Seop Bae [email protected]
RanTest(log(CMAX) ~ SEQ/SUBJ + PRD + TRT, BEdata, Random="SUBJ") fBE = log(CMAX) ~ ADM/SEQ/SUBJ + PRD + TRT RanTest(fBE, BEdata, Random=c("ADM", "SUBJ")) RanTest(fBE, BEdata, Random=c("ADM", "SUBJ"), Type=2) RanTest(fBE, BEdata, Random=c("ADM", "SUBJ"), Type=1)
RanTest(log(CMAX) ~ SEQ/SUBJ + PRD + TRT, BEdata, Random="SUBJ") fBE = log(CMAX) ~ ADM/SEQ/SUBJ + PRD + TRT RanTest(fBE, BEdata, Random=c("ADM", "SUBJ")) RanTest(fBE, BEdata, Random=c("ADM", "SUBJ"), Type=2) RanTest(fBE, BEdata, Random=c("ADM", "SUBJ"), Type=1)
Risk (proportion) difference between two groups
RD(y1, n1, y2, n2, conf.level=0.95)
RD(y1, n1, y2, n2, conf.level=0.95)
y1 |
positive event count of test (the first) group |
n1 |
total count of the test (the first) group |
y2 |
positive event count of control (the second) group |
n2 |
total count of control (the second) group |
conf.level |
confidence level |
It calculates risk difference between the two groups. No continuity correction here. If you need percent scale, multiply the output by 100.
The result is a data.frame.
p1 |
proportion from the first group |
p2 |
proportion from the second group |
RD |
risk difference, p1 - p2 |
SE |
standard error of RD |
lower |
lower confidence limit of RD |
upper |
upper confidence limit of RD |
Kyun-Seop Bae [email protected]
RR
, OR
, RDmn1
, RRmn1
, ORmn1
, RDmn
, RRmn
, ORmn
RD(104, 11037, 189, 11034) # no continuity correction
RD(104, 11037, 189, 11034) # no continuity correction
Risk difference and its score confidence interval between two groups with stratification by inverse variance method
RDinv(d0, conf.level=0.95)
RDinv(d0, conf.level=0.95)
d0 |
A data.frame or matrix, of which each row means a stratum. This should have four columns named y1, n1, y2, and n2; y1 and y2 for events of each group, n1 and n2 for the sample size of each stratum. The second group is usually the control group. |
conf.level |
confidence level |
It calculates risk difference and its confidence interval between two groups by inverse variance method. If you need percent scale, multiply the output by 100. This supports stratification. This can be used for meta-analysis also.
The following output will be returned for each stratum and common value. There is no standard error.
p1 |
proportion from the first group, y1/n1 |
p2 |
proportion from the second group, y2/n2 |
RD |
risk difference, p1 - p2. The point estimate of common RD is calculated with MH weight. |
lower |
lower confidence limit of RD |
upper |
upper confidence limit of RD |
Kyun-Seop Bae [email protected]
RDmn1
, RRmn1
, ORmn1
, RDmn
, RRmn
, ORmn
, RRinv
, ORinv
, ORcmh
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") RDinv(d1)
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") RDinv(d1)
Risk difference and its score confidence interval between two groups with stratification by the Miettinen and Nurminen method
RDmn(d0, conf.level=0.95, eps=1e-8)
RDmn(d0, conf.level=0.95, eps=1e-8)
d0 |
A data.frame or matrix, of which each row means a stratum. This should have four columns named y1, n1, y2, and n2; y1 and y2 for events of each group, n1 and n2 for sample size of each stratum. The second group is usually the control group. Maximum allowable value for n1 and n2 is 1e8. |
conf.level |
confidence level |
eps |
absolute value less than eps is regarded as negligible |
It calculates risk difference and its score confidence interval between the two groups. The confidence interval is asymmetric, and there is no standard error in the output. If you need percent scale, multiply the output by 100. This supports stratification. This implementation uses uniroot function which usually gives at least 5 significant digits. This can be used for meta-analysis also.
The following output will be returned for each stratum and common value. There is no standard error.
p1 |
proportion from the first group, y1/n1 |
p2 |
proportion from the second group, y2/n2 |
RD |
risk difference, p1 - p2. The point estimate of common RD is calculated with MN weight. |
lower |
lower confidence limit of RD |
upper |
upper confidence limit of RD |
Kyun-Seop Bae [email protected]
Miettinen O, Nurminen M. Comparative analysis of two rates. Stat Med 1985;4:213-26
RDmn1
, RRmn1
, ORmn1
, RRmn
, ORmn
, RDinv
, RRinv
, ORinv
, ORcmh
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") RDmn(d1)
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") RDmn(d1)
Risk difference and its score confidence interval between two groups without stratification
RDmn1(y1, n1, y2, n2, conf.level=0.95, eps=1e-8)
RDmn1(y1, n1, y2, n2, conf.level=0.95, eps=1e-8)
y1 |
positive event count of test (the first) group |
n1 |
total count of the test (the first) group. Maximum allowable value is 1e8. |
y2 |
positive event count of control (the second) group |
n2 |
total count of control (the second) group. Maximum allowable value is 1e8. |
conf.level |
confidence level |
eps |
absolute value less than eps is regarded as negligible |
It calculates risk difference and its score confidence interval between the two groups. The confidence interval is asymmetric, and there is no standard error in the output. If you need percent scale, multiply the output by 100. This does not support stratification. This implementation uses uniroot function which usually gives at least 5 significant digits.
There is no standard error.
p1 |
proportion from the first group, y1/n1 |
p2 |
proportion from the second group, y2/n2 |
RD |
risk difference, p1 - p2 |
lower |
lower confidence limit of RD |
upper |
upper confidence limit of RD |
Kyun-Seop Bae [email protected]
Miettinen O, Nurminen M. Comparative analysis of two rates. Stat Med 1985;4:213-26
RRmn1
, ORmn1
, RDmn
, RRmn
, ORmn
RDmn1(104, 11037, 189, 11034)
RDmn1(104, 11037, 189, 11034)
REG is similar to SAS PROC REG.
REG(Formula, Data, conf.level=0.95, HC=FALSE, Resid=FALSE, Weights=1, summarize=TRUE)
REG(Formula, Data, conf.level=0.95, HC=FALSE, Resid=FALSE, Weights=1, summarize=TRUE)
Formula |
a conventional formula for a linear model |
Data |
a |
conf.level |
confidence level for the confidence limit |
HC |
heteroscedasticity related output is required such as HC0, HC3, White's first and second moment specification test |
Resid |
if |
Weights |
weights for each observation or residual square. This is usually the inverse of each variance. |
summarize |
If this is |
It performs the core function of SAS PROC REG.
The result is comparable to that of SAS PROC REG.
The first part is ANOVA table.
The second part is measures about fitness.
The third part is the estimates of coefficients.
Estimate |
point estimate of parameters, coefficients |
Estimable |
estimability: 1=TRUE, 0=FALSE. This appears only when at least one inestimability occurs. |
Std. Error |
standard error of the point estimate |
Lower CL |
lower confidence limit with conf.level |
Upper CL |
lower confidence limit with conf.level |
Df |
degree of freedom |
t value |
value for t distribution |
Pr(>|t|) |
probability of larger than absolute t value from t distribution with residual's degree of freedom |
The above result is repeated using HC0 and HC3, with following White's first and second moment specification test, if HC option is specified. The t values and their p values with HC1 and HC2 are between those of HC0 and H3.
Fitted |
Fitted value or y hat. This is returned only with Resid=TRUE option. |
Residual |
Weighted residuals. This is returned only with Resid=TRUE option. |
If summarize=FALSE
, REG
returns;
coeffcients |
beta coefficients |
g2 |
g2 inverse |
rank |
rank of the model matrix |
DFr |
degree of freedom for the residual |
SSE |
sum of square error |
Kyun-Seop Bae [email protected]
REG(uptake ~ Plant + Type + Treatment + conc, CO2) REG(uptake ~ conc, CO2, HC=TRUE) REG(uptake ~ conc, CO2, Resid=TRUE) REG(uptake ~ conc, CO2, HC=TRUE, Resid=TRUE) REG(uptake ~ conc, CO2, summarize=FALSE)
REG(uptake ~ Plant + Type + Treatment + conc, CO2) REG(uptake ~ conc, CO2, HC=TRUE) REG(uptake ~ conc, CO2, Resid=TRUE) REG(uptake ~ conc, CO2, HC=TRUE, Resid=TRUE) REG(uptake ~ conc, CO2, summarize=FALSE)
regD
provides rich diagnostics such as student residual, leverage(hat), Cook's D, studentized deleted residual, DFFITS, and DFBETAS.
regD(Formula, Data)
regD(Formula, Data)
Formula |
a conventional formula for a linear model |
Data |
a |
It performs the conventional regression analysis. This does not use g2 inverse, therefore it cannot handle a singular matrix. If the model(design) matrix is not full rank, use REG
or fewer parameters.
Coefficients |
conventional coefficients summary with Wald statistics |
Diagnostics |
Diagnostics table for detecting outlier or influential/leverage points. This includes fitted (Predicted), residual (Residual), standard error of residual(se_resid), studentized residual(RStudent), hat(Leverage), Cook's D, studentized deleted residual(sdResid), DIFFITS, and COVRATIO. |
DFBETAS |
Column names are the names of coefficients. Each row shows how much each coefficient is affected by deleting the corressponding row of observation. |
Kyun-Seop Bae [email protected]
regD(uptake ~ conc, CO2)
regD(uptake ~ conc, CO2)
Relative Risk between the two groups
RR(y1, n1, y2, n2, conf.level=0.95)
RR(y1, n1, y2, n2, conf.level=0.95)
y1 |
positive event count of test (the first) group |
n1 |
total count of the test (the first) group |
y2 |
positive event count of control (the second) group |
n2 |
total count of control (the second) group |
conf.level |
confidence level |
It calculates relative risk of the two groups. No continuity correction here. If you need percent scale, multiply the output by 100.
The result is a data.frame.
p1 |
proportion from the first group |
p2 |
proportion from the second group |
RR |
relative risk, p1/p2 |
SElog |
standard error of log(RR) |
lower |
lower confidence limit of RR |
upper |
upper confidence limit of RR |
Kyun-Seop Bae [email protected]
RD
, OR
, RDmn1
, RRmn1
, ORmn1
, RDmn
, RRmn
, ORmn
RR(104, 11037, 189, 11034) # no continuity correction
RR(104, 11037, 189, 11034) # no continuity correction
Relative risk and its score confidence interval of two groups with stratification by inverse variance method
RRinv(d0, conf.level=0.95)
RRinv(d0, conf.level=0.95)
d0 |
A data.frame or matrix, of which each row means a stratum. This should have four columns named y1, n1, y2, and n2; y1 and y2 for events of each group, n1 and n2 for sample size of each stratum. The second group is usually the control group. |
conf.level |
confidence level |
It calculates relative risk and its confidence interval of two groups by inverse variance method. This supports stratification. This can be used for meta-analysis also.
The following output will be returned for each stratum and common value. There is no standard error.
p1 |
proportion from the first group, y1/n1 |
p2 |
proportion from the second group, y2/n2 |
RR |
relative risk, p1/p2. The point estimate of common RR is calculated with MH weight. |
lower |
lower confidence limit of RR |
upper |
upper confidence limit of RR |
Kyun-Seop Bae [email protected]
RDmn1
, RRmn1
, ORmn1
, RDmn
, RRmn
, ORmn
, RDinv
, ORinv
, ORcmh
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") RRinv(d1)
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") RRinv(d1)
Relative risk and its score confidence interval of two groups with stratification by the Miettinen and Nurminen method
RRmn(d0, conf.level=0.95, eps=1e-8)
RRmn(d0, conf.level=0.95, eps=1e-8)
d0 |
A data.frame or matrix, of which each row means a strata. This should have four columns named y1, n1, y2, and n2; y1 and y2 for events of each group, n1 and n2 for sample size of each stratum. The second group is usually the control group. |
conf.level |
confidence level |
eps |
absolute value less than eps is regarded as negligible |
It calculates relative risk and its score confidence interval of the two groups. The confidence interval is asymmetric, and there is no standard error in the output. This supports stratification. This implementation uses uniroot function, which usually gives at least 5 significant digits. Whereas PropCIs::riskscoreci function uses cubic equation approximation which gives only about 2 significant digits. This can be used for meta-analysis also.
The following output will be returned for each strata and common value. There is no standard error.
p1 |
proportion from the first group, y1/n1 |
p2 |
proportion from the second group, y2/n2 |
RR |
relative risk, p1/p2. Point estimate of common RR is calculated with MN weight. |
lower |
lower confidence limit of RR |
upper |
upper confidence limit of RR |
Kyun-Seop Bae [email protected]
Miettinen O, Nurminen M. Comparative analysis of two rates. Stat Med 1985;4:213-26
RDmn1
, RRmn1
, ORmn1
, RDmn
, ORmn
, RDinv
, RRinv
, ORinv
, ORcmh
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") RRmn(d1)
d1 = matrix(c(25, 339, 28, 335, 23, 370, 40, 364), nrow=2, byrow=TRUE) colnames(d1) = c("y1", "n1", "y2", "n2") RRmn(d1)
Relative risk and its score confidence interval of the two groups without stratification
RRmn1(y1, n1, y2, n2, conf.level=0.95, eps=1e-8)
RRmn1(y1, n1, y2, n2, conf.level=0.95, eps=1e-8)
y1 |
positive event count of test (the first) group |
n1 |
total count of the test (the first) group |
y2 |
positive event count of control (the second) group |
n2 |
total count of control (the second) group |
conf.level |
confidence level |
eps |
absolute value less than eps is regarded as negligible |
It calculates the relative risk and its score confidence interval of the two groups. The confidence interval is asymmetric, and there is no standard error in the output. This does not support stratification. This implementation uses uniroot function, which usually gives at least 5 significant digits. Whereas PropCIs::riskscoreci function uses cubic equation approximation which gives only about 2 significant digits.
There is no standard error.
p1 |
proportion from the first group, y1/n1 |
p2 |
proportion from the second group, y2/n2 |
RR |
relative risk, p1/p2 |
lower |
lower confidence limit of RR |
upper |
upper confidence limit of RR |
Kyun-Seop Bae [email protected]
Miettinen O, Nurminen M. Comparative analysis of two rates. Stat Med 1985;4:213-26
RDmn1
, ORmn1
, RDmn
, RRmn
, ORmn
RRmn1(104, 11037, 189, 11034)
RRmn1(104, 11037, 189, 11034)
Calculates pooled variance and degree of freedom using Satterthwaite equation.
satt(vars, dfs, ws=c(1, 1))
satt(vars, dfs, ws=c(1, 1))
vars |
a vector of variances |
dfs |
a vector of degree of freedoms |
ws |
a vector of weights |
The input can be more than two variances.
Variance |
approximated variance |
Df |
degree of freedom |
Kyun-Seop Bae [email protected]
Score confidence of a proportion in one group
ScoreCI(y, n, conf.level=0.95)
ScoreCI(y, n, conf.level=0.95)
y |
positive event count of a group |
n |
total count of a group |
conf.level |
confidence level |
It calculates score confidence interval of a proportion in one group. The confidence interval is asymmetric and there is no standard error in the output. If you need percent scale, multiply the output by 100.
The result is a data.frame. There is no standard error.
PE |
point estimation for the proportion |
Lower |
lower confidence limit of Prop |
Upper |
upper confidence limit of Prop |
Kyun-Seop Bae [email protected]
ScoreCI(104, 11037)
ScoreCI(104, 11037)
Standard deviation of a sample.
SD(y)
SD(y)
y |
a vector of numerics |
It removes NA
in the input vector. The length of the vector should be larger than 1.
Sample standard deviation
Kyun-Seop Bae [email protected]
The estimate of the standard error of the sample mean
SEM(y)
SEM(y)
y |
a vector of numerics |
It removes NA
in the input vector.
The estimate of the standard error of the sample mean
Kyun-Seop Bae [email protected]
Sequential upper bounds for cumulative Z-test on accumaltive data. Z values are correlated. This is usually used for group sequential design.
seqBound(ti, alpha = 0.05, side = 2, t2 = NULL, asf = 1)
seqBound(ti, alpha = 0.05, side = 2, t2 = NULL, asf = 1)
ti |
times for test. These should be [0, 1]. |
alpha |
goal alpha value for the last test at time 0. |
side |
1=one-side test, 2=two-side test |
t2 |
fractions of information amount. These should be [0, 1]. If not available, ti will be used instead. |
asf |
alpha spending function. 1=O'Brien-Flemming, 2=Pocock, 3=alpha*ti, 4=alpha*ti^1.5, 5=alpha*ti^2 |
It calculates upper z-bounds and cumulative alpha-values for the repeated test in group sequential design. The correlation is assumed to be sqrt(t_i/t_j).
The result is a matrix.
ti |
time of test |
bi |
upper z-bound |
cum.alpha |
cumulative alpha-value |
Kyun-Seop Bae [email protected]
Reboussin DM, DeMets DL, Kim K, Lan KKG. Computations for group sequential boundaries using the Lan-DeMets function method. Controlled Clinical Trials. 2000;21:190-207.
seqBound(ti=(1:5)/5) seqBound(ti=(1:5)/5, asf=2)
seqBound(ti=(1:5)/5) seqBound(ti=(1:5)/5, asf=2)
Confidence interval with given upper bounds, time of tests, the last Z-value, and confidence level.
seqCI(bi, ti, Zval, conf.level=0.95)
seqCI(bi, ti, Zval, conf.level=0.95)
bi |
upper bound z-values |
ti |
times for test. These should be [0, 1]. |
Zval |
the last z-value from the observed data. This is not necessarily the planned final Z-value. |
conf.level |
confidence level |
It calculates confidence interval with given upper bounds, time of tests, the last Z-value, and confidence level. It assumes two-side test. mvtnorm::pmvt
(with noncentrality) is better than pmvnorm in calculating power, sample size, and confidence interval. But, Lan-DeMets used multi-variate normal rather than multi-variate noncentral t distributionh. This function followed Lan-DeMets for the consistency with previous results. For the theoretical background, see the reference.
confidence interval of Z-value for the given confidence level.
Kyun-Seop Bae [email protected]
Reboussin DM, DeMets DL, Kim K, Lan KKG. Computations for group sequential boundaries using the Lan-DeMets function method. Controlled Clinical Trials. 2000;21:190-207.
seqCI(bi = c(2.53, 2.61, 2.57, 2.47, 2.43, 2.38), ti = c(.2292, .3333, .4375, .5833, .7083, .8333), Zval=2.82)
seqCI(bi = c(2.53, 2.61, 2.57, 2.47, 2.43, 2.38), ti = c(.2292, .3333, .4375, .5833, .7083, .8333), Zval=2.82)
Skewness with a conventional formula.
Skewness(y)
Skewness(y)
y |
a vector of numerics |
It removes NA
in the input vector.
Estimate of skewness
Kyun-Seop Bae [email protected]
Standard errof of the skewness with a conventional formula.
SkewnessSE(y)
SkewnessSE(y)
y |
a vector of numerics |
It removes NA
in the input vector.
Standard error of the estimated skewness
Kyun-Seop Bae [email protected]
Do F test with a given slice term.
SLICE(Formula, Data, Term, By)
SLICE(Formula, Data, Term, By)
Formula |
a conventional formula for a linear model |
Data |
a |
Term |
a factor name (not interaction) to calculate the sum of square and do F test with least square means |
By |
a factor name to be used for slice |
It performs F test with a given slice term. It is similar to the SLICE option SAS PROC GLM.
Returns sum of square and its F value and p-value. Row names are the levels of the slice term.
Df |
degree of freedom |
Sum Sq |
sum of square for the set of contrasts |
Mean Sq |
mean square |
F value |
F value for the F distribution |
Pr(>F) |
proability of larger than F value |
Kyun-Seop Bae [email protected]
SLICE(uptake ~ Type*Treatment, CO2, "Type", "Treatment") SLICE(uptake ~ Type*Treatment, CO2, "Treatment", "Type")
SLICE(uptake ~ Type*Treatment, CO2, "Type", "Treatment") SLICE(uptake ~ Type*Treatment, CO2, "Treatment", "Type")
Sum of squares with ANOVA.
SS(x, rx, L, eps=1e-8)
SS(x, rx, L, eps=1e-8)
x |
a result of |
rx |
a result of |
L |
linear hypothesis, a full matrix matching the information in |
eps |
Less than this value is considered as zero. |
It calculates sum of squares and completes the ANOVA table.
ANOVA table |
a classical ANOVA table without the residual(Error) part. |
Kyun-Seop Bae [email protected]
Calculates a formula table for expected mean square of Type III SS.
T3MS(Formula, Data, L0, eps=1e-8)
T3MS(Formula, Data, L0, eps=1e-8)
Formula |
a conventional formula for a linear model |
Data |
a |
L0 |
a matrix of row linear contrasts, if missed, |
eps |
Less than this value is considered as zero. |
This is necessary for further hypothesis tests of nesting factors.
A coefficient matrix for Type III expected mean square
Kyun-Seop Bae [email protected]
T3MS(log(CMAX) ~ SEQ/SUBJ + PRD + TRT, BEdata)
T3MS(log(CMAX) ~ SEQ/SUBJ + PRD + TRT, BEdata)
Hypothesis test of Type III SS using an error term other than MSE. This corresponds to SAS PROC GLM's RANDOM /TEST clause.
T3test(Formula, Data, H="", E="", eps=1e-8)
T3test(Formula, Data, H="", E="", eps=1e-8)
Formula |
a conventional formula for a linear model |
Data |
a |
H |
Hypothesis term |
E |
Error term |
eps |
Less than this value is considered as zero. |
It tests a factor of type III SS using some other term as an error term. Here the error term should not be MSE.
Returns one or more ANOVA table(s) of type III SS.
Kyun-Seop Bae [email protected]
T3test(log(CMAX) ~ SEQ/SUBJ + PRD + TRT, BEdata, E=c("SEQ:SUBJ")) T3test(log(CMAX) ~ SEQ/SUBJ + PRD + TRT, BEdata, H="SEQ", E=c("SEQ:SUBJ"))
T3test(log(CMAX) ~ SEQ/SUBJ + PRD + TRT, BEdata, E=c("SEQ:SUBJ")) T3test(log(CMAX) ~ SEQ/SUBJ + PRD + TRT, BEdata, H="SEQ", E=c("SEQ:SUBJ"))
This produces essentially the same to t.test except using summarized input (sufficient statistics).
tmtest(m1, s1, n1, m0, s0, n0, conf.level=0.95, nullHypo=0, var.equal=F)
tmtest(m1, s1, n1, m0, s0, n0, conf.level=0.95, nullHypo=0, var.equal=F)
m1 |
mean of the first (test, active, experimental) group |
s1 |
sample standard deviation of the first group |
n1 |
sample size of the first group |
m0 |
mean of the second (reference, control, placebo) group |
s0 |
sample standard deviation of the second group |
n0 |
sample size of the second group |
conf.level |
confidence level |
nullHypo |
value for the difference of means under null hypothesis |
var.equal |
assumption on the variance equality |
The default is Welch t-test with Satterthwaite approximation.
The output format is very similar to t.test
Kyun-Seop Bae [email protected]
tmtest(5.4, 10.5, 3529, 5.1, 8.9, 5190) # NEJM 388;15 p1386 tmtest(5.4, 10.5, 3529, 5.1, 8.9, 5190, var.equal=TRUE)
tmtest(5.4, 10.5, 3529, 5.1, 8.9, 5190) # NEJM 388;15 p1386 tmtest(5.4, 10.5, 3529, 5.1, 8.9, 5190, var.equal=TRUE)
Trimmed mean wrapping mean
function.
trimmedMean(y, Trim=0.05)
trimmedMean(y, Trim=0.05)
y |
a vector of numerics |
Trim |
trimming proportion. Default is 0.05 |
It removes NA
in the input vector.
The value of trimmed mean
Kyun-Seop Bae [email protected]
Summarize a continuous dependent variable with or without independent variables.
tsum(Formula=NULL, Data=NULL, ColNames=NULL, MaxLevel=30, ...)
tsum(Formula=NULL, Data=NULL, ColNames=NULL, MaxLevel=30, ...)
Formula |
a conventional formula |
Data |
a data.frame or a matrix |
ColNames |
If there is no Formula, this will be used. |
MaxLevel |
More than this will not be handled. |
... |
arguments to be passed to |
A convenient summarization function for a continuous variable. This is a wrapper function to tsum0
, tsum1
, tsum2
, or tsum3
.
A data.frame of descriptive summarization values.
Kyun-Seop Bae [email protected]
tsum(lh) t(tsum(CO2)) t(tsum(uptake ~ Treatment, CO2)) tsum(uptake ~ Type + Treatment, CO2) print(tsum(uptake ~ conc + Type + Treatment, CO2), digits=3)
tsum(lh) t(tsum(CO2)) t(tsum(uptake ~ Treatment, CO2)) tsum(uptake ~ Type + Treatment, CO2) print(tsum(uptake ~ conc + Type + Treatment, CO2), digits=3)
Summarize a continuous dependent(y) variable without any independent(x) variable.
tsum0(d, y, e=c("Mean", "SD", "N"), repl=list(c("length"), c("n")))
tsum0(d, y, e=c("Mean", "SD", "N"), repl=list(c("length"), c("n")))
d |
a data.frame or matrix with colnames |
y |
y variable name, a continuous variable |
e |
a vector of summarize function names |
repl |
list of strings to replace after summarize. The length of list should be 2, and both should have the same length. |
A convenient summarization function for a continuous variable.
A vector of summarized values
Kyun-Seop Bae [email protected]
tsum0(CO2, "uptake") tsum0(CO2, "uptake", repl=list(c("mean", "length"), c("Mean", "n")))
tsum0(CO2, "uptake") tsum0(CO2, "uptake", repl=list(c("mean", "length"), c("Mean", "n")))
Summarize a continuous dependent(y) variable with one independent(x) variable.
tsum1(d, y, u, e=c("Mean", "SD", "N"), ou="", repl=list(c("length"), ("n")))
tsum1(d, y, u, e=c("Mean", "SD", "N"), ou="", repl=list(c("length"), ("n")))
d |
a data.frame or matrix with colnames |
y |
y variable name. a continuous variable |
u |
x variable name, upper side variable |
e |
a vector of summarize function names |
ou |
order of levels of upper side x variable |
repl |
list of strings to replace after summarize. The length of list should be 2, and both should have the same length. |
A convenient summarization function for a continuous variable with one x variable.
A data.frame of summarized values. Row names are from e
names. Column names are from the levels of x variable.
Kyun-Seop Bae [email protected]
tsum1(CO2, "uptake", "Treatment") tsum1(CO2, "uptake", "Treatment", e=c("mean", "median", "sd", "min", "max", "length"), ou=c("chilled", "nonchilled"), repl=list(c("median", "length"), c("med", "n")))
tsum1(CO2, "uptake", "Treatment") tsum1(CO2, "uptake", "Treatment", e=c("mean", "median", "sd", "min", "max", "length"), ou=c("chilled", "nonchilled"), repl=list(c("median", "length"), c("med", "n")))
Summarize a continuous dependent(y) variable with two independent(x) variables.
tsum2(d, y, l, u, e=c("Mean", "SD", "N"), h=NULL, ol="", ou="", rm.dup=TRUE, repl=list(c("length"), c("n")))
tsum2(d, y, l, u, e=c("Mean", "SD", "N"), h=NULL, ol="", ou="", rm.dup=TRUE, repl=list(c("length"), c("n")))
d |
a data.frame or matrix with colnames |
y |
y variable name. a continuous variable |
l |
x variable name to be shown on the left side |
u |
x variable name to be shown on the upper side |
e |
a vector of summarize function names |
h |
a vector of summarize function names for the horizontal subgroup. If |
ol |
order of levels of left side x variable |
ou |
order of levels of upper side x variable |
rm.dup |
if |
repl |
list of strings to replace after summarize. The length of list should be 2, and both should have the same length. |
A convenient summarization function for a continuous variable with two x variables; one on the left side, the other on the upper side.
A data.frame of summarized values. Column names are from the levels of u
. Row names are basically from the levels of l
.
Kyun-Seop Bae [email protected]
tsum2(CO2, "uptake", "Type", "Treatment") tsum2(CO2, "uptake", "Type", "conc") tsum2(CO2, "uptake", "Type", "Treatment", e=c("mean", "median", "sd", "min", "max", "length"), ou=c("chilled", "nonchilled"), repl=list(c("median", "length"), c("med", "n")))
tsum2(CO2, "uptake", "Type", "Treatment") tsum2(CO2, "uptake", "Type", "conc") tsum2(CO2, "uptake", "Type", "Treatment", e=c("mean", "median", "sd", "min", "max", "length"), ou=c("chilled", "nonchilled"), repl=list(c("median", "length"), c("med", "n")))
Summarize a continuous dependent(y) variable with three independent(x) variables.
tsum3(d, y, l, u, e=c("Mean", "SD", "N"), h=NULL, ol1="", ol2="", ou="", rm.dup=TRUE, repl=list(c("length"), c("n")))
tsum3(d, y, l, u, e=c("Mean", "SD", "N"), h=NULL, ol1="", ol2="", ou="", rm.dup=TRUE, repl=list(c("length"), c("n")))
d |
a data.frame or matrix with colnames |
y |
y variable name. a continuous variable |
l |
a vector of two x variable names to be shown on the left side. The length should be 2. |
u |
x variable name to be shown on the upper side |
e |
a vector of summarize function names |
h |
a list of two vectors of summarize function names for the first and second horizontal subgroups. If |
ol1 |
order of levels of 1st left side x variable |
ol2 |
order of levels of 2nd left side x variable |
ou |
order of levels of upper side x variable |
rm.dup |
if |
repl |
list of strings to replace after summarize. The length of list should be 2, and both should have the same length. |
A convenient summarization function for a continuous variable with three x variables; two on the left side, the other on the upper side.
A data.frame of summarized values. Column names are from the levels of u
. Row names are basically from the levels of l
.
Kyun-Seop Bae [email protected]
tsum3(CO2, "uptake", c("Type", "Treatment"), "conc") tsum3(CO2, "uptake", c("Type", "Treatment"), "conc", e=c("mean", "median", "sd", "min", "max", "length"), h=list(c("mean", "sd", "length"), c("mean", "length")), ol2=c("chilled", "nonchilled"), repl=list(c("median", "length"), c("med", "n")))
tsum3(CO2, "uptake", c("Type", "Treatment"), "conc") tsum3(CO2, "uptake", c("Type", "Treatment"), "conc", e=c("mean", "median", "sd", "min", "max", "length"), h=list(c("mean", "sd", "length"), c("mean", "length")), ol2=c("chilled", "nonchilled"), repl=list(c("median", "length"), c("med", "n")))
This is comparable to SAS PROC TTEST.
TTEST(x, y, conf.level=0.95)
TTEST(x, y, conf.level=0.95)
x |
a vector of data from the first (test, active, experimental) group |
y |
a vector of data from the second (reference, control, placebo) group |
conf.level |
confidence level |
Caution on choosing the row to use in the output.
The output format is comparable to SAS PROC TTEST.
Kyun-Seop Bae [email protected]
TTEST(mtcars[mtcars$am==1, "mpg"], mtcars[mtcars$am==0, "mpg"])
TTEST(mtcars[mtcars$am==1, "mpg"], mtcars[mtcars$am==0, "mpg"])
The estimate of the upper bound of the confidence limit using t-distribution
UCL(y, conf.level=0.95)
UCL(y, conf.level=0.95)
y |
a vector of numerics |
conf.level |
confidence level |
It removes NA
in the input vector.
The estimate of the upper bound of the confidence limit using t-distribution
Kyun-Seop Bae [email protected]
Returns descriptive statistics of a numeric vector.
UNIV(y, conf.level = 0.95)
UNIV(y, conf.level = 0.95)
y |
a numeric vector |
conf.level |
confidence level for confidence limit |
A convenient and comprehensive function for descriptive statistics. NA is removed during the calculation. This is similar to SAS PROC UNIVARIATE.
nAll |
count of all elements in the input vector |
nNA |
count of NA element |
nFinite |
count of finite numbers |
Mean |
mean excluding NA |
SD |
standard deviation excluding NA |
CV |
coefficient of variation in percent |
SEM |
standard error of the sample mean, the sample mean divided by nFinite |
LowerCL |
lower confidence limit of mean |
UpperCL |
upper confidence limit of mean |
TrimmedMean |
trimmed mean with trimming 1 - confidence level |
Min |
minimum value |
Q1 |
first quartile value |
Median |
median value |
Q3 |
third quartile value |
Max |
maximum value |
Range |
range of finite numbers. maximum - minimum |
IQR |
inter-quartile range type 2, which is SAS default |
MAD |
median absolute deviation |
VarLL |
lower confidence limit of variance |
VarUL |
upper confidence limit of variance |
Skewness |
skewness |
SkewnessSE |
standard error of skewness |
Kurtosis |
kurtosis |
KurtosisSE |
kurtosis |
GeometricMean |
geometric mean, calculated only when all given values are positive. |
GeometricCV |
geometric coefficient of variation in percent, calculated only when all given values are positive. |
Kyun-Seop Bae [email protected]
UNIV(lh)
UNIV(lh)
F-test for the ratio of two groups' variances. This is similar to var.test except using the summarized input.
vtest(v1, n1, v0, n0, ratio=1, conf.level=0.95)
vtest(v1, n1, v0, n0, ratio=1, conf.level=0.95)
v1 |
sample variance of the first (test, active, experimental) group |
n1 |
sample size of the first group |
v0 |
sample variance of the second (reference, control, placebo) group |
n0 |
sample size of the second group |
ratio |
value for the ratio of variances under null hypothesis |
conf.level |
confidence level |
For the confidence interval of one group, use UNIV function.
The output format is very similar to var.test.
Kyun-Seop Bae [email protected]
vtest(10.5^2, 5190, 8.9^2, 3529) # NEJM 388;15 p1386 vtest(2.3^2, 13, 1.5^2, 11, conf.level=0.9) # Red book p240
vtest(10.5^2, 5190, 8.9^2, 3529) # NEJM 388;15 p1386 vtest(2.3^2, 13, 1.5^2, 11, conf.level=0.9) # Red book p240
This is shown in SAS PROC REG as the Test of First and Second Moment Specification.
WhiteTest(rx)
WhiteTest(rx)
rx |
a result of lm |
This is also called as White's general test for heteroskedasticity.
Returns a direct test result by more coomplex theorem 2 , not by simpler corollary 1.
Kyun-Seop Bae [email protected]
White H. A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica 1980;48(4):817-838.
WhiteTest(lm(mpg ~ disp, mtcars))
WhiteTest(lm(mpg ~ disp, mtcars))
This is similar to two groups t-test, but using standard normal (Z) distribution.
ztest(m1, s1, n1, m0, s0, n0, conf.level=0.95, nullHypo=0)
ztest(m1, s1, n1, m0, s0, n0, conf.level=0.95, nullHypo=0)
m1 |
mean of the first (test, active, experimental) group |
s1 |
known standard deviation of the first group |
n1 |
sample size of the first group |
m0 |
mean of the second (reference, control, placebo) group |
s0 |
known standard deviationo of the second group |
n0 |
sample size of the second group |
conf.level |
confidence level |
nullHypo |
value for the difference of means under null hypothesis |
Use this only for known standard deviations (or variances) or very large sample sizes per group.
The output format is very similar to t.test
Kyun-Seop Bae [email protected]
ztest(5.4, 10.5, 3529, 5.1, 8.9, 5190) # NEJM 388;15 p1386
ztest(5.4, 10.5, 3529, 5.1, 8.9, 5190) # NEJM 388;15 p1386