Title: | A few Useful Functions for Statisticians |
---|---|
Description: | Various useful functions for statisticians: describe data, plot Kaplan-Meier curves with numbers of subjects at risk, compare data sets, display spaghetti-plot, build multi-contingency tables... |
Authors: | Hugo Varet |
Maintainer: | Hugo Varet <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.2 |
Built: | 2024-12-15 07:27:47 UTC |
Source: | CRAN |
Various useful functions for statisticians: describe data, plot Kaplan-Meier curves with numbers of subjects at risk, compare data sets, display spaghetti-plot, build multi-contingency tables...
Hugo Varet
Compares two data frames assumed to be identical, prints the differences in the console and also returns the results in a data frame
compare(d1, d2, id, file.export = NULL)
compare(d1, d2, id, file.export = NULL)
d1 |
first data frame |
d2 |
second data frame |
id |
character string, primary key of the two data bases |
file.export |
character string, name of the XLS file exported |
A data frame containing the differences between the two data bases
Hugo Varet
N=100 data1=data.frame(id=1:N,a=rnorm(N), b=factor(sample(LETTERS[1:5],N,TRUE)), c=as.character(sample(LETTERS[1:5],N,TRUE)), d=as.Date(32768:(32768+N-1),origin="1900-01-01")) data1$c=as.character(data1$c) data2=data1 data2$id[3]=4654 data2$a[30]=NA data2$a[31]=45 data2$b=as.character(data2$b) data2$d=as.character(data2$d) data2$e=rnorm(N) compare(data1,data2,"id")
N=100 data1=data.frame(id=1:N,a=rnorm(N), b=factor(sample(LETTERS[1:5],N,TRUE)), c=as.character(sample(LETTERS[1:5],N,TRUE)), d=as.Date(32768:(32768+N-1),origin="1900-01-01")) data1$c=as.character(data1$c) data2=data1 data2$id[3]=4654 data2$a[30]=NA data2$a[31]=45 data2$b=as.character(data2$b) data2$d=as.character(data2$d) data2$e=rnorm(N) compare(data1,data2,"id")
Converts variables of a data frame in factors
convert_factor(data, vars)
convert_factor(data, vars)
data |
the data frame in which we can find |
vars |
vector of character string of covariates |
The modified data frame
Hugo Varet
cgd$steroids cgd$status cgd=convert_factor(cgd,c("steroids","status"))
cgd$steroids cgd$status cgd=convert_factor(cgd,c("steroids","status"))
Converts 0s in NA
convert_zero_NA(data, vars)
convert_zero_NA(data, vars)
data |
the data frame in which we can find |
vars |
a character vector of covariates for which to transform 0s in |
The modified data frame
Hugo Varet
my.data=data.frame(x=rbinom(20,1,0.5),y=rbinom(20,1,0.5),z=rbinom(20,1,0.5)) my.data=convert_zero_NA(my.data,c("y","z"))
my.data=data.frame(x=rbinom(20,1,0.5),y=rbinom(20,1,0.5),z=rbinom(20,1,0.5)) my.data=convert_zero_NA(my.data,c("y","z"))
equal partsCuts a quantitative variable in equal parts
cut_quanti(x, n, ...)
cut_quanti(x, n, ...)
x |
a numeric vector |
n |
numeric, the number of parts: 2 to cut according to the median, and so on... |
... |
other arguments to be passed in |
A factor vector
Hugo Varet
cut_quanti(cgd$height, 3)
cut_quanti(cgd$height, 3)
Makes descriptive statistics of a data frame according to a group covariate or not, can export the results
desc(data, vars, group = NULL, whole = TRUE, vars.labels = vars, group.labels = NULL, type.quanti = "mean", test.quanti = "param", test = TRUE, noquote = TRUE, justify = TRUE, digits = 2, file.export = NULL, language = "english")
desc(data, vars, group = NULL, whole = TRUE, vars.labels = vars, group.labels = NULL, type.quanti = "mean", test.quanti = "param", test = TRUE, noquote = TRUE, justify = TRUE, digits = 2, file.export = NULL, language = "english")
data |
data frame to describe in which we can find |
vars |
vector of character strings of the covariates to describe |
group |
character string, statistics created for each levels of this covariate |
whole |
boolean, |
vars.labels |
vector of character string for sweeter names of covariates in the output |
group.labels |
vector of character string for sweeter column names |
type.quanti |
character string, |
test.quanti |
character string, |
test |
boolean, |
noquote |
boolean, |
justify |
boolean, |
digits |
number of digits of the statistics (mean, sd, median, min, max, Q1, Q3, %), p-values always have 3 digits |
file.export |
character string, name of the XLS file exported |
language |
character string, |
A matrix of the descriptive statistics
Hugo Varet
cgd$steroids=factor(cgd$steroids) cgd$status=factor(cgd$status) desc(cgd,vars=c("center","sex","age","height","weight","steroids","status"),group="treat")
cgd$steroids=factor(cgd$steroids) cgd$status=factor(cgd$status) desc(cgd,vars=c("center","sex","age","height","weight","steroids","status"),group="treat")
Plots a histogram with a boxplot below
hist_boxplot(x, freq = TRUE, density = FALSE, main = NULL, xlab = NULL, ymax = NULL, ...)
hist_boxplot(x, freq = TRUE, density = FALSE, main = NULL, xlab = NULL, ymax = NULL, ...)
x |
a numeric vector |
freq |
boolean, |
density |
boolean, |
main |
character string, main title of the histogram |
xlab |
character string, label of the x axis |
ymax |
numeric value, maximum of the y axis |
... |
other arguments to be passed in |
None
Hugo Varet
par(mfrow=c(1,2)) hist_boxplot(rnorm(100),col="lightblue",freq=TRUE) hist_boxplot(rnorm(100),col="lightblue",freq=FALSE,density=TRUE)
par(mfrow=c(1,2)) hist_boxplot(rnorm(100),col="lightblue",freq=TRUE) hist_boxplot(rnorm(100),col="lightblue",freq=FALSE,density=TRUE)
Computes odd ratios and their confidence intervals for logistic regressions
IC_OR_glm(model, alpha = 0.05)
IC_OR_glm(model, alpha = 0.05)
model |
a |
alpha |
type I error, 0.05 by default |
A matrix with the estimaed coefficients of the logistic model, their s.e., z-values, p-values, OR and CI of the OR
Hugo Varet
IC_OR_glm(glm(inherit~sex+age,data=cgd,family="binomial"))
IC_OR_glm(glm(inherit~sex+age,data=cgd,family="binomial"))
Computess risk ratios and their confidence intervals for Cox models
IC_RR_coxph(model, alpha = 0.05, sided = 2)
IC_RR_coxph(model, alpha = 0.05, sided = 2)
model |
a |
alpha |
type I error, 0.05 by default |
sided |
1 or 2 for one or two-sided |
A matrix with the estimaed coefficients of the Cox model, their s.e., z-values, p-values, RR and CI of the RR
Hugo Varet
cgd$time=cgd$tstop-cgd$tstart IC_RR_coxph(coxph(Surv(time,status)~sex+age,data=cgd),alpha=0.05,sided=1)
cgd$time=cgd$tstop-cgd$tstart IC_RR_coxph(coxph(Surv(time,status)~sex+age,data=cgd),alpha=0.05,sided=1)
Builds a big cross table between several covariates
multi.table(data, vars)
multi.table(data, vars)
data |
the data frame in which we can find |
vars |
vector of character string of covariates |
A matrix containing all the contingency tables between the covariates
Hugo Varet
multi.table(cgd,c("treat","sex","inherit"))
multi.table(cgd,c("treat","sex","inherit"))
Kaplan-Meier plot with number of subjects at risk below
plot_km(formula, data, test = TRUE, xy.pvalue = NULL, conf.int = FALSE, times.print = NULL, nrisk.labels = NULL, legend = NULL, xlab = NULL, ylab = NULL, ylim = c(0, 1.02), left = 4.5, bottom = 5, cex.mtext = par("cex"), lwd = 2, lty = 1, col = NULL, ...)
plot_km(formula, data, test = TRUE, xy.pvalue = NULL, conf.int = FALSE, times.print = NULL, nrisk.labels = NULL, legend = NULL, xlab = NULL, ylab = NULL, ylim = c(0, 1.02), left = 4.5, bottom = 5, cex.mtext = par("cex"), lwd = 2, lty = 1, col = NULL, ...)
formula |
same formula than in |
data |
data frame with |
test |
boolean, |
xy.pvalue |
numeric vector of length 2, coordinates where to display the p-value of the log-rank test |
conf.int |
boolean, |
times.print |
numeric vector, times at which to display the numbers of subjects at risk |
nrisk.labels |
character vector to modify the levels of |
legend |
character string ( |
xlab |
character string, label of the time axis |
ylab |
character string, label of the y axis |
ylim |
numeric vector of length 2, minimum and maximum of the y-axis |
left |
integer, size of left margin |
bottom |
integer, number of lines in addition of the table below the graph |
cex.mtext |
numeric, size of the numbers of subjects at risk |
lwd |
width of the Kaplan-Meier curve(s) |
lty |
type of the Kaplan-Meier curve(s) |
col |
color(s) of the Kaplan-Meier curve(s) |
... |
other arguments to be passed in |
None
Hugo Varet
cgd$time=cgd$tstop-cgd$tstart plot_km(Surv(time,status)~sex,data=cgd,col=c("blue","red"))
cgd$time=cgd$tstop-cgd$tstart plot_km(Surv(time,status)~sex,data=cgd,col=c("blue","red"))
Spaghetti plot and plot of the mean at each time
plot_mm(formula, data, col.spag = 1, col.mean = 1, type = "spaghettis", tick.times = TRUE, xlab = NULL, ylab = NULL, main = "", lwd.spag = 1, lwd.mean = 4, ...)
plot_mm(formula, data, col.spag = 1, col.mean = 1, type = "spaghettis", tick.times = TRUE, xlab = NULL, ylab = NULL, main = "", lwd.spag = 1, lwd.mean = 4, ...)
formula |
|
data |
data frame in which we can find |
col.spag |
vector of length |
col.mean |
vector of length |
type |
|
tick.times |
boolean, |
xlab |
character sring, label of the time axis |
ylab |
character string, label of the y axis |
main |
character string, main title |
lwd.spag |
numeric, width of the spaghetti lines, 1 by default |
lwd.mean |
numeric, width of the mean lines, 4 by default |
... |
Other arguments to be passed in |
None
Hugo Varet on Anais Charles-Nelson's idea
N=10 time=rep(1:4,N) obs=1.1*time + rep(0:1,each=2*N) + rnorm(4*N) my.data=data.frame(id=rep(1:N,each=4),time,obs,group=rep(1:2,each=N*2)) par(xaxs="i",yaxs="i") plot_mm(obs~time+(group|id),my.data,col.spag=my.data$group, col.mean=c("blue","red"),type="both",main="Test plot_mm")
N=10 time=rep(1:4,N) obs=1.1*time + rep(0:1,each=2*N) + rnorm(4*N) my.data=data.frame(id=rep(1:N,each=4),time,obs,group=rep(1:2,each=N*2)) par(xaxs="i",yaxs="i") plot_mm(obs~time+(group|id),my.data,col.spag=my.data$group, col.mean=c("blue","red"),type="both",main="Test plot_mm")
Plots a multi cross table on a graph
plot_multi.table(data, vars, main = "")
plot_multi.table(data, vars, main = "")
data |
the data frame in which we can find |
vars |
vector of character string of covariates |
main |
main title of the plot |
None
Hugo Varet
plot_multi.table(cgd,c("treat","sex","inherit"))
plot_multi.table(cgd,c("treat","sex","inherit"))
Plots points with the corresponding linear regression line
plot_reg(x, y, pch = 19, xlab = NULL, ylab = NULL, ...)
plot_reg(x, y, pch = 19, xlab = NULL, ylab = NULL, ...)
x |
numeric vector |
y |
numeric vector |
pch |
type of points |
xlab |
character string, label of the x axis, |
ylab |
character string, label of the y axis, |
... |
other arguments to be passed in |
None
Hugo Varet
plot_reg(cgd$age,cgd$height,xlab="Age (years)",ylab="Height")
plot_reg(cgd$age,cgd$height,xlab="Age (years)",ylab="Height")