Title: | Miscellaneous Esoteric Statistical Scripts |
---|---|
Description: | A mixed collection of useful and semi-useful diverse statistical functions, some of which may even be referenced in The R Primer book. |
Authors: | Claus Thorn Ekstrøm [aut, cre], Niels Aske Lundtorp Olsen [ctb] |
Maintainer: | Claus Thorn Ekstrøm <[email protected]> |
License: | GPL-2 |
Version: | 0.5.12 |
Built: | 2024-12-15 07:46:32 UTC |
Source: | CRAN |
Fast computation of weights needed for adaptive lasso based on Gaussian family data.
adaptive.weights(x, y, nu = 1, weight.method = c("multivariate", "univariate"))
adaptive.weights(x, y, nu = 1, weight.method = c("multivariate", "univariate"))
x |
input matrix, of dimension nobs x nvars; each row is an observation vector. |
y |
response variable. |
nu |
non-negative tuning parameter |
weight.method |
Should the weights be computed for multivariate regression model (only possible when the number of observations is larger than the number of parameters) or by individual marginal/"univariate" regression coefficients. |
The weights returned are 1/abs(beta_hat)^nu where the beta-parameters are estimated from the corresponding linear model (either multivariate or univariate).
Returns a list with two elements:
weights |
the computed weights |
nu |
the value of nu used for the computations |
Claus Ekstrom [email protected]
Xou, H (2006). The Adaptive Lasso and Its Oracle Properties. JASA, Vol. 101.
glmnet
set.seed(1) x <- matrix(rnorm(50000), nrow=50) y <- rnorm(50, mean=x[,1]) weights <- adaptive.weights(x, y) if (requireNamespace("glmnet", quietly = TRUE)) { res <- glmnet::glmnet(x, y, penalty.factor=weights$weights) head(res) }
set.seed(1) x <- matrix(rnorm(50000), nrow=50) y <- rnorm(50, mean=x[,1]) weights <- adaptive.weights(x, y) if (requireNamespace("glmnet", quietly = TRUE)) { res <- glmnet::glmnet(x, y, penalty.factor=weights$weights) head(res) }
Fast addition of vector to each row of a matrix. This corresponds to t(t(x) + v)
add_torows(x, v)
add_torows(x, v)
x |
A matrix with dimensions n*k. |
v |
A vector of length k. |
A matrix of dimension n*k where v is added to each row of x
Claus Ekstrom <[email protected]>
A <- matrix(1:12, ncol=3) B <- c(1, 2, 3) add_torows(A, B)
A <- matrix(1:12, ncol=3) B <- c(1, 2, 3) add_torows(A, B)
Compute the age in years of an individual based on the birth date and another (subsequent) date
age(from, to)
age(from, to)
from |
a vector of dates (birth dates) |
to |
a vector of current dates |
Returns the full number of years that a person is old on a given date.
A vector of ages (in years)
Claus Ekstrom [email protected]
born <- c("1971-08-18", "2000-02-28", "2001-12-20") check <- c("2016-08-28") age(born, check)
born <- c("1971-08-18", "2000-02-28", "2001-12-20") check <- c("2016-08-28") age(born, check)
Compute the area under the curve using linear or natural spline interpolation for two vectors where one corresponds to the x values and the other corresponds to the y values.
auc( x, y, from = min(x, na.rm = TRUE), to = max(x, na.rm = TRUE), type = c("linear", "spline"), absolutearea = FALSE, subdivisions = 100, ... )
auc( x, y, from = min(x, na.rm = TRUE), to = max(x, na.rm = TRUE), type = c("linear", "spline"), absolutearea = FALSE, subdivisions = 100, ... )
x |
a numeric vector of x values. |
y |
a numeric vector of y values of the same length as x. |
from |
The value from where to start calculating the area under the curve. Defaults to the smallest x value. |
to |
The value from where to end the calculation of the area under the curve. Defaults to the greatest x value. |
type |
The type of interpolation. Defaults to "linear" for area under the curve for linear interpolation. The value "spline" results in the area under the natural cubic spline interpolation. |
absolutearea |
A logical value that determines if negative
areas should be added to the total area under the curve. By
default the auc function subtracts areas that have negative y
values. Set |
subdivisions |
an integer telling how many subdivisions should be used for integrate (for non-linear approximations) |
... |
additional arguments passed on to approx (for linear approximations). In particular rule can be set to determine how values outside the range of x is handled. |
For linear interpolation the auc function computes the area under the curve using the composite trapezoid rule. For area under a spline interpolation, auc uses the splinefun function in combination with the integrate to calculate a numerical integral. The auc function can handle unsorted time values, missing observations, ties for the time values, and integrating over part of the area or even outside the area.
The value of the area under the curve.
Claus Ekstrom [email protected]
x <- 1:4 y <- c(0, 1, 1, 5) auc(x, y) # AUC from 0 to max(x) where we allow for extrapolation auc(x, y, from=0, rule=2) # Use value 0 to the left auc(x, y, from=0, rule=2, yleft=0) # Use 1/2 to the left auc(x, y, from=0, rule=2, yleft=.5) # Use 1/2 to the left with spline interpolation auc(x, y, from=0, rule=2, yleft=.5)
x <- 1:4 y <- c(0, 1, 1, 5) auc(x, y) # AUC from 0 to max(x) where we allow for extrapolation auc(x, y, from=0, rule=2) # Use value 0 to the left auc(x, y, from=0, rule=2, yleft=0) # Use 1/2 to the left auc(x, y, from=0, rule=2, yleft=.5) # Use 1/2 to the left with spline interpolation auc(x, y, from=0, rule=2, yleft=.5)
Monthly live births and deaths in Denmark from January 1901 to March 2013.
A data frame with 1356 observations on the following 4 variables.
a numeric vector giving the month
a numeric vector giving the year
a numeric vector. The number of births for the given month and year
a numeric vector. The number of deaths for the given month and year
Data were obtained from the StatBank from Danmarks Statistik, see http://www.statbank.dk.
data(bdstat) plot(bdstat$year + bdstat$month/13, bdstat$birth, type="l") # Create a table of births # Remove year 2013 as it is incomplete btable <- xtabs(births ~ year + month, data=bdstat, subset=(year<2013)) # Compute yearly birth frequencies per month btable.freq <- prop.table(btable, margin=1)
data(bdstat) plot(bdstat$year + bdstat$month/13, bdstat$birth, type="l") # Create a table of births # Remove year 2013 as it is incomplete btable <- xtabs(births ~ year + month, data=bdstat, subset=(year<2013)) # Compute yearly birth frequencies per month btable.freq <- prop.table(btable, margin=1)
Number of different types of bees caught in plates of different colours. There are four locations and within each location there are three replicates consisting of three plates of the three different colours (yellow, white and blue). Data are collected at 5 different dates over the summer season. Only data from one date available until data has been published.
A data frame with 72 observations on the following 7 variables.
a factor with levels Havreholm
Kragevig
Saltrup
Svaerdborg
. Four different localities
in Denmark.
a factor with levels A
B
C
a factor with levels Blue
White
Yellow
. Colour of plates
a factor with levels
july1
july14
june17
june3
june6
. Data
collected at different dates in summer season. Only one day is present in
the current data frame until the full data has been released.
a factor with levels Bumblebees
Solitary
.
Type of bee.
a numeric vector. The response variable with number of bees catched.
a numeric vector. The id of the clusters (each containg three plates).
Data were kindly provided by Casper Ingerslev Henriksen, Department of Agricultural Sciences, KU-LIFE. Added by Torben Martinussen <[email protected]>
data(bees) model <- glm(Number ~ Locality + Type*Color, family=poisson, data=bees)
data(bees) model <- glm(Number ~ Locality + Type*Color, family=poisson, data=bees)
Fast binning of numeric vector into equidistant bins
bin(x, width, origin = 0, missinglast = FALSE)
bin(x, width, origin = 0, missinglast = FALSE)
x |
A matrix of regressor variables. Must have the same number of rows as the length of y. |
width |
The width of the bins |
origin |
The starting point for the bins. Any number smaller than origin will be disregarded |
missinglast |
Boolean. Should the missing observations be added as a separate element at the end of the returned count vector. |
Missing values (NA, Inf, NaN) are added at the end of the vector as the last bin returned if missinglast is set to TRUE
An list with elements counts (the frequencies), origin (the origin), width (the width), missing (the number of missings), and last_bin_is_missing (boolean) telling whether the missinglast is true or not.
Hadley Wickham (from SO: https://stackoverflow.com/questions/13661065/superimpose-histogram-fits-in-one-plot-ggplot) - adapted here by Claus Ekstrøm <[email protected]>
set.seed(1) x <- sample(10, 20, replace = TRUE) bin(x, 15)
set.seed(1) x <- sample(10, 20, replace = TRUE) bin(x, 15)
Accepts a data frame as input and computes a contingency table for direct use in combination with the magrittr package.
categorize(.data, ...)
categorize(.data, ...)
.data |
A data frame |
... |
A formula (as in xtabs) or one or more objects which can be interpreted as factors (including character strings), or a list (or data frame) whose components can be so interpreted. |
categorize is a wrapper to xtabs or table such that a data frame can be given as the first argument.
A table (possibly as an xtabs class if a model formula was used)
Claus Ekstrom [email protected]
if (requireNamespace("magrittr", quietly = TRUE)) { library(magrittr) esoph %>% categorize(alcgp, agegp) esoph %>% categorize(~ alcgp + agegp) }
if (requireNamespace("magrittr", quietly = TRUE)) { library(magrittr) esoph %>% categorize(alcgp, agegp) esoph %>% categorize(~ alcgp + agegp) }
Copies an R object to the clipboard so it can be pasted in elsewhere.
clipit(x)
clipit(x)
x |
object to copy |
Returns nothing but will place the object in the clipboard
Nothing but will put the R object into the clipboard as a side effect
Jonas LindeLøv posted on twitter. Copied shamelessly by Claus Ekstrom [email protected]
## Not run: clipit(mtcars$mpg) ## End(Not run)
## Not run: clipit(mtcars$mpg) ## End(Not run)
Blood clotting activity (PCA) is measured for 158 Norway rats from two locations just before (baseline) and four days after injection of an anticoagulant (bromadiolone). Normally this would cause reduced blood clotting after 4 days compared to the baseline, but these rats are known to possess anticoagulent resistence to varying extent. The purpose is to relate anticoagulent resistence to gender and location and perhaps weight. Dose of injection is, however, admistered according to weight and gender.
A data frame with 158 observations on the following 6 variables.
a numeric vector
a
factor with levels Loc1
Loc2
a factor with
levels F
M
a numeric vector
a numeric vector with percent blood clotting activity at baseline
a numeric vector with percent blood clotting activity on day 4
Ann-Charlotte Heiberg, project at The Royal Veterinary and
Agricultural University, 1999.
Added by Ib M. Skovgaard <[email protected]>
data(clotting) dim(clotting) head(clotting) day0= transform(clotting, day=0, pca=PCA0) day4= transform(clotting, day=4, pca=PCA4) day.both= rbind(day0,day4) m1= lm(pca ~ rat + day*locality + day*sex, data=day.both) anova(m1) summary(m1) m2= lm(pca ~ rat + day, data=day.both) anova(m2) ## Log transformation suggested. ## Random effect of rat. ## maybe str(clotting) ; plot(clotting) ...
data(clotting) dim(clotting) head(clotting) day0= transform(clotting, day=0, pca=PCA0) day4= transform(clotting, day=4, pca=PCA4) day.both= rbind(day0,day4) m1= lm(pca ~ rat + day*locality + day*sex, data=day.both) anova(m1) summary(m1) m2= lm(pca ~ rat + day, data=day.both) anova(m2) ## Log transformation suggested. ## Random effect of rat. ## maybe str(clotting) ; plot(clotting) ...
Computes the correlation matrix distance between two correlation matrices
cmd(x, y)
cmd(x, y)
x |
First correlation matrix |
y |
Second correlation matrix |
Returns the correlation matrix distance, which is a value between 0 and 1. The correlation matrix distance becomes zero for equal correlation matrices and unity if they differ to a maximum extent.
Claus Ekstrom [email protected]
Herdin, M., and Czink, N., and Ozcelik, H., and Bonek, E. (2005). Correlation matrix distance, a meaningful measure for evaluation of non-stationary mimo channels. IEEE VTC.
m1 <- matrix(rep(1, 16), 4) m2 <- matrix(c(1, 0, .5, .5, 0, 1, .5, .5, .5, .5, 1, .5, .5, .5, .5, 1), 4) m3 <- matrix(c(1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1), 4) cmd(m1, m1) cmd(m1, m2) cmd(m2, m3)
m1 <- matrix(rep(1, 16), 4) m2 <- matrix(c(1, 0, .5, .5, 0, 1, .5, .5, .5, .5, 1, .5, .5, .5, .5, 1), 4) m3 <- matrix(c(1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1), 4) cmd(m1, m1) cmd(m1, m2) cmd(m2, m3)
Add and set alpha channel
col.alpha(col, alpha = 1)
col.alpha(col, alpha = 1)
col |
a vector of RGB color(s) |
alpha |
numeric value between 0 and 1. Zero results fully transparent and 1 means full opacity |
This function adds and set an alpha channel to a RGB color
Claus Ekstrom [email protected]
Ekstrom, CT (2011) The R Primer.
newcol <- col.alpha("blue", .5)
newcol <- col.alpha("blue", .5)
Shades an RBG color
col.shade(col, shade = 0.5)
col.shade(col, shade = 0.5)
col |
a vector of RGB color(s) |
shade |
numeric value between 0 and 1. Zero means no change and 1 results in black |
This function shades an RGB color and returns the shaded RGB color (with alpha channel added)
Claus Ekstrom [email protected]
Ekstrom, CT (2011) The R Primer.
newcol <- col.shade("blue")
newcol <- col.shade("blue")
Tints an RBG color
col.tint(col, tint = 0.5)
col.tint(col, tint = 0.5)
col |
a vector of RGB color(s) |
tint |
numeric value between 0 and 1. Zero results in white and 1 means no change |
This function tints an RGB color and returns the tinted RGB color (with alpha channel added)
Claus Ekstrom [email protected]
Ekstrom, CT (2011) The R Primer.
newcol <- col.tint("blue")
newcol <- col.tint("blue")
Fast computation of apply(m, 2, cumsum)
colCumSum(m)
colCumSum(m)
m |
A matrix |
A matrix the same size as m with the column-wise cumulative sums.
Claus Ekstrom <[email protected]>
# Generate a 100 by 10000 matrix x <- matrix(rnorm(100*10000), nrow=100) result <- colCumSum(x)
# Generate a 100 by 10000 matrix x <- matrix(rnorm(100*10000), nrow=100) result <- colCumSum(x)
Form row means for multiple vectors, numeric arrays (or data frames) conditional on the number of non-missing observations. NA is returned unless a minimum number of observations is observed.
conditional_rowMeans(..., minobs = 1L)
conditional_rowMeans(..., minobs = 1L)
... |
a series of numeric vectors, arrays, or data frames that have can be combined with cbind |
minobs |
an integer stating the minimum number of non-NA observations necessary to compute the row mean. Defaults to 1. |
A numeric vector containing the row sums or NA if not enough non-NA observations are present
conditional_rowMeans(1:5, c(1:4, NA), c(1:3, NA, NA)) conditional_rowMeans(1:5, c(1:4, NA), c(1:3, NA, NA), minobs=0) conditional_rowMeans(1:5, c(1:4, NA), c(1:3, NA, NA), minobs=2)
conditional_rowMeans(1:5, c(1:4, NA), c(1:3, NA, NA)) conditional_rowMeans(1:5, c(1:4, NA), c(1:3, NA, NA), minobs=0) conditional_rowMeans(1:5, c(1:4, NA), c(1:3, NA, NA), minobs=2)
Fast binning of cumulative vector sum with new groups when the sum passes a threshold or the group size becomes too large
cumsumbinning(x, threshold, cutwhenpassed = FALSE, maxgroupsize = NULL)
cumsumbinning(x, threshold, cutwhenpassed = FALSE, maxgroupsize = NULL)
x |
A matrix of regressor variables. Must have the same number of rows as the length of y. |
threshold |
The value of the threshold that the cumulative group sum must not cross OR the threshold that each group sum must pass (when the argument cuwhatpassed is set to TRUE). |
cutwhenpassed |
A boolean. Should the threshold be the upper limit of the group sum (the default) or the value that each group sum needs to pass (when set to TRUE). |
maxgroupsize |
An integer that defines the maximum number of elements in each group. NAs count as part of each group but do not add to the group sum. NULL (the default) corresponds to no group size limits. |
Missing values (NA, Inf, NaN) are completely disregarded and pairwise complete cases are used f
An integer vector giving the group indices
Claus Ekstrom <[email protected]>
set.seed(1) x <- sample(10, 20, replace = TRUE) cumsumbinning(x, 15) cumsumbinning(x, 15, 3) x <- c(3, 4, 5, 12, 1, 5, 3) cumsumbinning(x, 10) cumsumbinning(x, 10, cutwhenpassed=TRUE)
set.seed(1) x <- sample(10, 20, replace = TRUE) cumsumbinning(x, 15) cumsumbinning(x, 15, 3) x <- c(3, 4, 5, 12, 1, 5, 3) cumsumbinning(x, 10) cumsumbinning(x, 10, cutwhenpassed=TRUE)
Fast computation of the distance correation matrix between two matrices with the same number of rows. Note that this is not the same as the correlation matrix distance that can be computed with the cmd function.
dCor(x, y)
dCor(x, y)
x |
A matrix with dimensions n*k. |
y |
A matrix with dimensions n*l. |
A number between 0 and 1 representing the distance covariance between x and y
Claus Ekstrom <[email protected]>
Fast computation of the distance covariance between two matrices with the same number of rows.
dCov(x, y)
dCov(x, y)
x |
A matrix with dimensions n*k. |
y |
A matrix with dimensions n*l. |
A number representing the distance covariance between x and y
Claus Ekstrom <[email protected]>
Compute all the single terms in the scope argument that can dropped from the model, and compute a table of the corresponding Wald test statistics.
## S3 method for class 'geeglm' drop1( object, scope, test = c("Wald", "none", "score", "sasscore"), method = c("robust", "naive", "sandwich"), ... )
## S3 method for class 'geeglm' drop1( object, scope, test = c("Wald", "none", "score", "sasscore"), method = c("robust", "naive", "sandwich"), ... )
object |
a fitted object of class geese. |
scope |
a formula giving the terms to be considered for adding or dropping. |
test |
the type of test to include. |
method |
Indicates which method is used for computing the standard
error. |
... |
other arguments. Not currently used |
An object of class "anova" summarizing the differences in fit between the models.
Claus Ekstrom [email protected]
drop1
, geeglm
, geese
library(geepack) data(ohio) fit <- geeglm(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="exch", scale.fix=TRUE) drop1(fit)
library(geepack) data(ohio) fit <- geeglm(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="exch", scale.fix=TRUE) drop1(fit)
Compute all the single terms in the scope argument that can dropped from the model, and compute a table of the corresponding Wald test statistics.
## S3 method for class 'geem' drop1( object, scope, test = c("Wald", "none", "score", "sasscore"), method = c("robust", "naive", "sandwich"), ... )
## S3 method for class 'geem' drop1( object, scope, test = c("Wald", "none", "score", "sasscore"), method = c("robust", "naive", "sandwich"), ... )
object |
a fitted object of class geese. |
scope |
a formula giving the terms to be considered for adding or dropping. |
test |
the type of test to include. |
method |
Indicates which method is used for computing the standard
error. |
... |
other arguments. Not currently used |
An object of class "anova" summarizing the differences in fit between the models.
Claus Ekstrom [email protected]
drop1
, geem
library(geeM) library(geepack) data(ohio) ## Not run: fit <- geem(resp ~ age + smoke + age:smoke, id=id, data=ohio, family="binomial", corstr="exch", scale.fix=TRUE) drop1(fit) ## End(Not run)
library(geeM) library(geepack) data(ohio) ## Not run: fit <- geem(resp ~ age + smoke + age:smoke, id=id, data=ohio, family="binomial", corstr="exch", scale.fix=TRUE) drop1(fit) ## End(Not run)
Information on earthquakes worldwide in 2015 with a magnitude greater than 3 on the Richter scale. The variables are just a subset of the variables available at the source
A data frame with 19777 observations on the following 22 variables.
a factor with time of the earthquake
latitude
a numeric vector giving the decimal degrees latitude. Negative values for southern latitudes
longitude
a numeric vector giving the decimal degrees longitude. Negative values for western longitudes
depth
Depth of the event in kilometers
mag
The magnitude for the event
place
a factor giving a textual description of named geographic region near to the event.
type
a factor with levels earthquake
mining explosion
rock burst
https://www.usgs.gov/programs/earthquake-hazards
data(earthquakes) with(earthquakes, place[which.max(mag)])
data(earthquakes) with(earthquakes, place[which.max(mag)])
Expands a contingency table to a data frame where each observation in the table becomes a single observation in the data frame with corresponding information for each for each combination of the table dimensions.
expand_table(x)
expand_table(x)
x |
A table or matrix |
A data frame with the table or matrix expanded
Claus Ekstrom [email protected]
expand_table(diag(3)) m <- matrix(c(2, 1, 3, 0, 0, 2), 3) expand_table(m) result <- expand_table(UCBAdmissions) head(result) # Combine into table again xtabs(~Admit + Gender + Dept, data=result)
expand_table(diag(3)) m <- matrix(c(2, 1, 3, 0, 0, 2), 3) expand_table(m) result <- expand_table(UCBAdmissions) head(result) # Combine into table again xtabs(~Admit + Gender + Dept, data=result)
Converts the factor labels to numeric values and returns the factor as a numeric vector
fac2num(x)
fac2num(x)
x |
A factor |
Returns a vector of numeric values. Elements in the input factor that cannot be converted to numeric will produce NA.
Returns a numeric vector of the same length as x
Claus Ekstrom [email protected]
f <- factor(c(1,2,1,3,2,1,2,3,1)) fac2num(f)
f <- factor(c(1,2,1,3,2,1,2,3,1)) fac2num(f)
Performs randomization tests of features identified by the Lasso
feature.test( x, y, B = 100, type.measure = "deviance", s = "lambda.min", keeplambda = FALSE, olsestimates = TRUE, penalty.factor = rep(1, nvars), alpha = 1, control = list(trace = FALSE, maxcores = 24), ... )
feature.test( x, y, B = 100, type.measure = "deviance", s = "lambda.min", keeplambda = FALSE, olsestimates = TRUE, penalty.factor = rep(1, nvars), alpha = 1, control = list(trace = FALSE, maxcores = 24), ... )
x |
input matrix, of dimension nobs x nvars; each row is an observation vector. |
y |
quantitative response variable of length nobs |
B |
The number of randomizations used in the computations |
type.measure |
loss to use for cross-validation. See |
s |
Value of the penalty parameter 'lambda' at which predictions are
required. Default is the entire sequence used to create the model. See
|
keeplambda |
If set to |
olsestimates |
Logical. Should the test statistic be based on OLS
estimates from the model based on the variables selected by the lasso.
Defaults to |
penalty.factor |
a vector of weights used for adaptive lasso. See
|
alpha |
The elasticnet mixing parameter. See |
control |
A list of options that control the algorithm. Currently
|
... |
Other arguments passed to |
Returns a list of 7 variables:
p.full |
The p-value for the test of the full set of variables selected by the lasso (based on the OLS estimates) |
ols.selected |
A vector of the indices of the non-zero
variables selected by |
p.maxols |
The p-value for the maximum of the OLS test statistics |
lasso.selected |
A vector of
the indices of the non-zero variables selected by |
p.maxlasso |
The p-value for the maximum of the lasso test statistics |
lambda.orig |
The value of lambda used in the computations |
B |
The number of permutations used |
Claus Ekstrom [email protected] and Kasper Brink-Jensen [email protected]
Brink-Jensen, K and Ekstrom, CT 2014. Inference for feature selection using the Lasso with high-dimensional data. https://arxiv.org/abs/1403.4296
glmnet
# Simulate some data x <- matrix(rnorm(30*100), nrow=30) y <- rnorm(30, mean=1*x[,1]) # Make inference for features ## Not run: feature.test(x, y) ## End(Not run)
# Simulate some data x <- matrix(rnorm(30*100), nrow=30) y <- rnorm(30, mean=1*x[,1]) # Make inference for features ## Not run: feature.test(x, y) ## End(Not run)
Fill down missing values with the latest non-missing value
filldown(x)
filldown(x)
x |
A vector |
A vector or list with the NA's replaced by the last observed value.
Claus Ekstrom <[email protected]>
a <- c(1:5, "Howdy", NA, NA, 2:3, NA) filldown(a) filldown(c(NA, NA, NA, 3:5))
a <- c(1:5, "Howdy", NA, NA, 2:3, NA) filldown(a) filldown(c(NA, NA, NA, 3:5))
The geekin function fits generalized estimating equations but where the correlation structure is given as linear function of (scaled) fixed correlation structures.
geekin( formula, family = gaussian, data, weights, subset, id, na.action, control = geepack::geese.control(...), varlist, ... )
geekin( formula, family = gaussian, data, weights, subset, id, na.action, control = geepack::geese.control(...), varlist, ... )
formula |
See corresponding documentation to |
family |
See corresponding documentation to |
data |
See corresponding documentation to |
weights |
See corresponding documentation to |
subset |
See corresponding documentation to |
id |
a vector which identifies the clusters. The length of |
na.action |
See corresponding documentation to |
control |
See corresponding documentation to |
varlist |
a list containing one or more matrix or bdsmatrix objects that represent the correlation structures |
... |
further arguments passed to or from other methods. |
The geekin function is essentially a wrapper function to geeglm
.
Through the varlist argument, it allows for correlation structures of the
form
R = sum_i=1^k alpha_i R_i
where alpha_i are(nuisance) scale parameters that are used to scale the off-diagonal elements of the individual correlation matrices, R_i.
Returns an object of type geeglm
.
Claus Ekstrom [email protected]
lmekin
, geeglm
# Get dataset library(kinship2) library(mvtnorm) data(minnbreast) breastpeda <- with(minnbreast[order(minnbreast$famid), ], pedigree(id, fatherid, motherid, sex, status=(cancer& !is.na(cancer)), affected=proband, famid=famid)) set.seed(10) nfam <- 6 breastped <- breastpeda[1:nfam] # Simulate a response # Make dataset for lme4 df <- lapply(1:nfam, function(xx) { as.data.frame(breastped[xx]) }) mydata <- do.call(rbind, df) mydata$famid <- rep(1:nfam, times=unlist(lapply(df, nrow))) y <- lapply(1:nfam, function(xx) { x <- breastped[xx] rmvtnorm.pedigree(1, x, h2=0.3, c2=0) }) yy <- unlist(y) library(geepack) geekin(yy ~ 1, id=mydata$famid, varlist=list(2*kinship(breastped))) # lmekin(yy ~ 1 + (1|id), data=mydata, varlist=list(2*kinship(breastped)),method="REML")
# Get dataset library(kinship2) library(mvtnorm) data(minnbreast) breastpeda <- with(minnbreast[order(minnbreast$famid), ], pedigree(id, fatherid, motherid, sex, status=(cancer& !is.na(cancer)), affected=proband, famid=famid)) set.seed(10) nfam <- 6 breastped <- breastpeda[1:nfam] # Simulate a response # Make dataset for lme4 df <- lapply(1:nfam, function(xx) { as.data.frame(breastped[xx]) }) mydata <- do.call(rbind, df) mydata$famid <- rep(1:nfam, times=unlist(lapply(df, nrow))) y <- lapply(1:nfam, function(xx) { x <- breastped[xx] rmvtnorm.pedigree(1, x, h2=0.3, c2=0) }) yy <- unlist(y) library(geepack) geekin(yy ~ 1, id=mydata$famid, varlist=list(2*kinship(breastped))) # lmekin(yy ~ 1 + (1|id), data=mydata, varlist=list(2*kinship(breastped)),method="REML")
Compute Goodman-Kruskal's gamma statistic for a two-dimensional table of ordered categories
gkgamma(x, conf.level = 0.95)
gkgamma(x, conf.level = 0.95)
x |
A matrix or table representing the two-dimensional ordered contingency table of observations |
conf.level |
Level of confidence interval |
A list with class htest
containing the following components:
statistic |
the value the test statistic for testing no association |
p.value |
the p-value for the test |
estimate |
the value the gamma estimate |
conf.int |
the confidence interval for the gamma estimate |
method |
a character string indicating the type of test performed |
data.name |
a character string indicating the name of the data input |
observed |
the observed counts |
s0 |
the SE used when computing the test statistics |
s1 |
the SE used when computing the confidence interval |
Claus Ekstrom [email protected]
Goodman, Leo A. and Kruskal, William H. (1954). "Measures of Association for Cross Classifications". Journal of the American Statistical Association 49 (268): 732-764.
# Data from the Glostrup study comparing smoking to overall health in males smoke <- matrix(c(16, 15, 13, 10, 1, 73, 75, 59, 81, 29, 6, 6, 7, 17, 3, 1, 0, 1, 3, 1), ncol=4) colnames(smoke) <- c("VGood", "Good", "Fair", "Bad") # General health status rownames(smoke) <- c("Never", "No more", "1-14", "15-24", "25+") # Smoke amount gkgamma(smoke) chisq.test(smoke)
# Data from the Glostrup study comparing smoking to overall health in males smoke <- matrix(c(16, 15, 13, 10, 1, 73, 75, 59, 81, 29, 6, 6, 7, 17, 3, 1, 0, 1, 3, 1), ncol=4) colnames(smoke) <- c("VGood", "Good", "Fair", "Bad") # General health status rownames(smoke) <- c("Never", "No more", "1-14", "15-24", "25+") # Smoke amount gkgamma(smoke) chisq.test(smoke)
Average yearly summer (June, July, August) air temperature for Tasiilaq, Greenland
A data frame with 51 observations on the following 2 variables.
year
average air temperature (degrees Celcius)
Data provided by Sebastian Mernild.
Originally obtained from
http://www.dmi.dk/dmi/index/gronland/vejrarkiv-gl.htm.
Added by Claus
Ekstrom <[email protected]>
Aktuelt Naturvidenskab september 2010.
http://aktuelnaturvidenskab.dk/fileadmin/an/nr-4/an4_2010gletscher.pdf
data(greenland) model <- lm(airtemp ~ year, data=greenland) plot(greenland$year, greenland$airtemp, xlab="Year", ylab="Air temperature") abline(model, col="red")
data(greenland) model <- lm(airtemp ~ year, data=greenland) plot(greenland$year, greenland$airtemp, xlab="Year", ylab="Air temperature") abline(model, col="red")
Dataset on subjective happiness, tax rates, population sizes, continent, and major religion for 148 countries
A data frame with 148 observations on the following 6 variables.
a factor with 148 levels that contain the country names
a numeric vector with the average subject happiness score (on a scale from 0-10)
a numeric vector showing the tax revenue as percentage of GDP
a factor with levels Buddhist
Christian
Hindu
Muslim
None
or Other
a factor with levels AF
, AS
,
EU
, NA
, OC
, SA
, corresponding to the continents
Africa, Asia, Europe, North America, Ocenaia, South American, respectively
a numeric vector showing the population (in millions)
Data collected by Ellen Ekstroem.
Population sizes are from
Wikipedia per August 2nd, 2012
https://en.wikipedia.org/wiki/List_of_countries_by_population
Major
religions are from Wikipedia per August 2nd, 2012
https://en.wikipedia.org/wiki/Religions_by_country
Tax rates are
from Wikipedia per August 2nd, 2012
https://en.wikipedia.org/wiki/List_of_countries_by_tax_revenue_as_percentage_of_GDP
Average happiness scores are from "Veenhoven, R. Average happiness in
148 nations 2000-2009, World Database of Happiness, Erasmus University
Rotterdam, The Netherlands". Assessed on August 2nd, 2012 at:
https://worlddatabaseofhappiness-archive.eur.nl/hap_nat/findingreports/RankReport_AverageHappiness.php
data(happiness) with(happiness, symbols(tax, happy, circles=sqrt(population)/8, inches=FALSE, bg=continent)) # # Make a prettier image with transparent colors # newcols <- rgb(t(col2rgb(palette())), alpha=100, maxColorValue=255) with(happiness, symbols(tax, happy, circles=sqrt(population)/8, inches=FALSE, bg=newcols[continent], xlab="Tax (% of GDP)", ylab="Happiness")) # # Simple analysis # res <- lm(happy ~ religion + population + tax:continent, data=happiness) summary(res)
data(happiness) with(happiness, symbols(tax, happy, circles=sqrt(population)/8, inches=FALSE, bg=continent)) # # Make a prettier image with transparent colors # newcols <- rgb(t(col2rgb(palette())), alpha=100, maxColorValue=255) with(happiness, symbols(tax, happy, circles=sqrt(population)/8, inches=FALSE, bg=newcols[continent], xlab="Tax (% of GDP)", ylab="Happiness")) # # Simple analysis # res <- lm(happy ~ religion + population + tax:continent, data=happiness) summary(res)
Show both the head and tail of an R object
ht(x, n = 6L, m = n, returnList = FALSE, ...)
ht(x, n = 6L, m = n, returnList = FALSE, ...)
x |
The object to show |
n |
The number of elements to list for the head |
m |
The number of elements to list for the tail |
returnList |
Logical. Should the result be returned as a list |
... |
additional arguments passed to functions (not used at the moment) |
This function does no error checking and it is up to the user to ensure that the input is indeed symmetric, positive-definite, and a matrix.
NULL unless returnList is set to TRUE in which case a list is returned
Claus Ekstrom, [email protected].
ht(trees) ht(diag(20)) ht(1:20) ht(1:20, returnList=TRUE)
ht(trees) ht(diag(20)) ht(1:20) ht(1:20, returnList=TRUE)
Alleles are assumed to be numerated from 1 and up with no missing label. Thus if the largest value in either allele1 or allele2 is K then we assume that there can be at least K possible alleles. Genotypes are sorted such the the smallest allele comes first, i.e., 2x1 -> 1x2, and 2x3 -> 2x3
hwe_frequencies(allele1, allele2, min_alleles = 0L)
hwe_frequencies(allele1, allele2, min_alleles = 0L)
allele1 |
An integer vector (starting with values 1 upwards) of first alleles |
allele2 |
An integer vector (starting with values 1 upwards) of second alleles |
min_alleles |
A minimum number of unique alleles available |
A list with three variables: allele_freq for estimated allele frequencies, genotype_freq for estimated genotype_frequencies (under HWE assumption), obs_genotype is the frequency of the genotypes, available_genotypes is the number of available genotypes used for the estimation, and unique_alleles is the number of unique alleles (matches the length of allele_freq)
Claus Ekstrom <[email protected]>
al1 <- sample(1:5, size=1000, replace=TRUE, prob=c(.4, .2, .2, .1, .1)) al2 <- sample(1:5, size=1000, replace=TRUE, prob=c(.4, .2, .2, .1, .1)) hwe_frequencies(al1, al2)
al1 <- sample(1:5, size=1000, replace=TRUE, prob=c(.4, .2, .2, .1, .1)) al2 <- sample(1:5, size=1000, replace=TRUE, prob=c(.4, .2, .2, .1, .1)) hwe_frequencies(al1, al2)
The impact of advertizing impact, temperature, and price on ice cream consumption
A data frame with 30 observations on the following 4 variables.
a numeric vector character vector giving the standardized price
temperature in degrees Fahrenheit
a factor with levels 1_low
2_medium
3_high
a factor with levels posters
radio
television
Unknown origin
data("icecreamads")
data("icecreamads")
Kolmogorov-Smirnov goodness of fit test for cumulative discrete data.
ks_cumtest(x, B = 10000L, prob = NULL)
ks_cumtest(x, B = 10000L, prob = NULL)
x |
A vector representing the contingency table. |
B |
The number of simulations used to compute the p-value. |
prob |
A positive vector of the same length as x representing the distribution under the null hypothesis. It will be scaled to sum to 1. If NULL (the default) then a uniform distribution is assumed. |
The name of the function might change in the future so keep that in mind!
Simulation is done by random sampling from the null hypothesis.
A list of class "htest" giving the simulation results.
Claus Ekstrom <[email protected]>
x <- 1:6 ks_cumtest(x)
x <- 1:6 ks_cumtest(x)
Artificial dataset to show that the p-value obtained for the Kruskal Wallis is only valid _after_ the distributional form has been checked to be the same for all groups.
An artificial data frame with 18 observations in each of three groups.
measurements for group 1
measurements for group 2
measurements for group 3
Data example found on the internet
data(kwdata) newdata <- stack(kwdata) kruskal.test(values ~ ind, newdata)
data(kwdata) newdata <- stack(kwdata) kruskal.test(values ~ ind, newdata)
The estimated life expectancy for newborn Danes split according to gender.
A data frame with 70 observations on the following 3 variables.
year
a character vector
giving the calendar interval on which the estimation was based.
male
a numeric vector
Life expectancy for males (in years).
female
a numeric vector
Life expectancy for females (in years)
myear
a numeric vector
The midpoint of the year interval
Data collected from Danmarks Statistik. See https://www.dst.dk/en for more information.
data(lifeexpect) plot(lifeexpect$myear, lifeexpect$male)
data(lifeexpect) plot(lifeexpect$myear, lifeexpect$male)
Loads and extracts an object from an RData file
loadRData(filename)
loadRData(filename)
filename |
The path to the RData file |
Returns an R object
An R object
ricardo (from GitHub)
## Not run: d <- loadRData("~/blah/ricardo.RData") ## End(Not run)
## Not run: d <- loadRData("~/blah/ricardo.RData") ## End(Not run)
Split a matrix into block diagonal sub matrices according to clusters and combine the lower triangular parts into a vector
lower.tri.vector(x, cluster = rep(1, nrow(x)), diag = FALSE)
lower.tri.vector(x, cluster = rep(1, nrow(x)), diag = FALSE)
x |
a square matrix |
cluster |
numeric or factor. Is used to identify the sub-matrices of
|
diag |
logical. Should the diagonal be included? |
Returns a numeric vector containing the elements of the lower triangular sub matrices.
Claus Ekstrom [email protected]
m <- matrix(1:64, ncol=8) cluster <- c(1, 1, 1, 1, 2, 2, 3, 3) lower.tri.vector(m, cluster)
m <- matrix(1:64, ncol=8) cluster <- c(1, 1, 1, 1, 2, 2, 3, 3) lower.tri.vector(m, cluster)
Researchers in a Midwestern county tracked flu cases requiring hospitalization in those residents aged 65 and older during a two-month period one winter. They matched each case with 2 controls by sex and age (150 cases, 300 controls). They used medical records to determine whether cases and controls had received a flu vaccine shot and whether they had underlying lung disease. They wanted to know whether flu vaccination prevents hospitalization for flu (severe cases of flu). Underlying lung disease is a potential confounder.
A data frame with 450 observations on the following 4 variables.
id
a numeric vector
iscase
a factor with levels Control
Case
vaccine
a factor with levels Not
Vaccinated
lung
a factor with levels None
Disease
Modified from: Stokes, Davis, Koch (2000). "Categorical Data Analysis Using the SAS System," Chapter 10.
data(matched)
data(matched)
Fast computation of the maximum subarray sum of a vector using Kadane's algorithm. The implementation handles purely negative numbers.
maximum_subarray(x)
maximum_subarray(x)
x |
A vector |
A list with three elements: sum (the maximum subarray sum), start (the starting index of the subarray) and end (the ending index of the subarray)
Claus Ekstrom <[email protected]>
maximum_subarray(1:4) maximum_subarray(c(-2, 1, -3, 4, -1, 2, 1, -5, 4)) maximum_subarray(rnorm(100000))
maximum_subarray(1:4) maximum_subarray(c(-2, 1, -3, 4, -1, 2, 1, -5, 4)) maximum_subarray(rnorm(100000))
Collection of miscellaneous useful and semi-useful functions and add-on functions that enhances a number of existing packages and provides In particular in relation to statistical genetics
Package: | MESS |
Type: | Package |
Version: | 1.0 |
Date: | 2012-03-29 |
License: | GPL-2 |
how to use the package, including the most important ~~
Claus Thorn Ekstrøm [email protected]
Maintainer: Claus Thorn Ekstrøm
[email protected]
Ekstrøm, C. (2011). The R Primer. Chapman & Hall.
Fast computation of simple regression slopes for each predictor represented by a column in a matrix
mfastLmCpp(y, x, addintercept = TRUE)
mfastLmCpp(y, x, addintercept = TRUE)
y |
A vector of outcomes. |
x |
A matrix of regressor variables. Must have the same number of rows as the length of y. |
addintercept |
A logical that determines if the intercept should be included in all analyses (TRUE) or not (FALSE) |
No error checking is done
A data frame with three variables: coefficients, stderr, and tstat that gives the slope estimate, the corresponding standard error, and their ratio for each column in x.
Claus Ekstrom <[email protected]>
## Not run: // Generate 100000 predictors and 100 observations x <- matrix(rnorm(100*100000), nrow=100) y <- rnorm(100, mean=x[,1]) mfastLmCpp(y, x) ## End(Not run)
## Not run: // Generate 100000 predictors and 100 observations x <- matrix(rnorm(100*100000), nrow=100) y <- rnorm(100, mean=x[,1]) mfastLmCpp(y, x) ## End(Not run)
Monte Carlo test in a two-way contingency table with the total number of observations fixed, row margin fixed, or both margins fixed.
monte_carlo_chisq_test(x, margin = c("N", "rows", "both"), B = 100000L)
monte_carlo_chisq_test(x, margin = c("N", "rows", "both"), B = 100000L)
x |
A matrix representing the contingency table. |
margin |
A string that determines which margin is fixed: Either "N" for the total number of observations (the default), "rows" for fixed row sums, and "both" for simultaneously fixed row and column sums. |
B |
The number of simulations used to compute the p-value. |
Simulation is done by random sampling from the set of all tables with given marginal(s), and works only if the relevant marginal(s) are strictly positive. Continuity correction is never used, and the statistic is quoted without it.
A list of class "htest" giving the simulation results.
Claus Ekstrom <[email protected]>
m <- matrix(c(12, 4, 8, 6), 2) chisq.test(m) chisq.test(m, correct=FALSE) monte_carlo_chisq_test(m) fisher.test(m) monte_carlo_chisq_test(m, margin="both") m2 <- matrix(c(9, 3, 3, 7), 2) monte_carlo_chisq_test(m, margin="N") monte_carlo_chisq_test(m, margin="both")
m <- matrix(c(12, 4, 8, 6), 2) chisq.test(m) chisq.test(m, correct=FALSE) monte_carlo_chisq_test(m) fisher.test(m) monte_carlo_chisq_test(m, margin="both") m2 <- matrix(c(9, 3, 3, 7), 2) monte_carlo_chisq_test(m, margin="N") monte_carlo_chisq_test(m, margin="both")
Monthly levels of ammonia nitrogen in a river over two years
A data frame with 120 observations on the following 3 variables.
The ammonia nitrogen levels (mg/l). A value of zero corresponds to a censoring, but it really is censored at <0.01
A logical vector indicating if the value was censored
The year
Found on the internet and partly simulated
data(nh4)
data(nh4)
ordered.clusters
determines if identical elements of a vector appear
in contiguous clusters, and returns TRUE
if the do and FALSE
otherwise.
ordered.clusters(id)
ordered.clusters(id)
id |
a vector |
The function returns TRUE if the elements appear in contiguous clusters and FALSE otherwise
Claus Ekstrom [email protected] with suggestions from Peter Dalgaard.
x <- c(1, 1, 1, 2, 2, 3, 4, 1, 5, 5, 5) ordered.clusters(x) ordered.clusters(sort(x)) ordered.clusters(x[order(x)])
x <- c(1, 1, 1, 2, 2, 3, 4, 1, 5, 5, 5) ordered.clusters(x) ordered.clusters(sort(x)) ordered.clusters(x[order(x)])
Fast computation of indices of all pairwise element of a vector of length n.
pairwise_combination_indices(n, self = FALSE)
pairwise_combination_indices(n, self = FALSE)
n |
A number giving the number of elements to create all pairwise indices from |
self |
A logical that determines whether a column should also be multiplied by itself. |
Note that the output order of columns corresponds to the order of the columns in x. First column 1 is multiplied with each of the other columns, then column 2 with the remaining columns etc.
A matrix with n*(n+1)/2 rows (if self=TRUE) or n*(n-1)/2 rows (if self=FALSE, the default) and two columns gicing all possible combinations of indices.
Claus Ekstrom <[email protected]>
pairwise_combination_indices(3) pairwise_combination_indices(4, self=TRUE)
pairwise_combination_indices(3) pairwise_combination_indices(4, self=TRUE)
Fast computation of all pairwise element-wise column products of a matrix.
pairwise_Schur_product(x, self = FALSE)
pairwise_Schur_product(x, self = FALSE)
x |
A matrix with dimensions r*c. |
self |
A logical that determines whether a column should also be multiplied by itself. |
Note that the output order of columns corresponds to the order of the columns in x. First column 1 is multiplied with each of the other columns, then column 2 with the remaining columns etc.
A matrix with the same number of rows as x and a number of columns corresponding to c choose 2 (+ c if self is TRUE), where c is the number of columns of x.
Claus Ekstrom <[email protected]>
X <- cbind(rep(1, 4), 1:4, 4:1) pairwise_Schur_product(X) pairwise_Schur_product(X, self=TRUE)
X <- cbind(rep(1, 4), 1:4, 4:1) pairwise_Schur_product(X) pairwise_Schur_product(X, self=TRUE)
Calculate pairwise correlations between group levels with corrections for multiple testing.
pairwise.cor.test( x, g, p.adjust.method = p.adjust.methods, method = c("pearson", "kendall", "spearman"), ... )
pairwise.cor.test( x, g, p.adjust.method = p.adjust.methods, method = c("pearson", "kendall", "spearman"), ... )
x |
response vector. |
g |
grouping vector or factor. |
p.adjust.method |
method for adjusting p values (see |
method |
string argument to set the method to compute the correlation. Possibilities are "pearson" (the default), "kendall", and "spearman" |
... |
additional arguments passed to |
Note that correlation tests require that the two vectors examined are of the same length.
Thus, if the grouping defines groups of varying lengths then the specific correlation is
not computed and a NA
is returned instead. The adjusted p values are only based on
the actual correlation that are computed.
Extra arguments that are passed on to cor.test
may or may not be sensible in this context.
Object of class pairwise.htest
attach(airquality) Month <- factor(Month, labels = month.abb[5:9]) pairwise.cor.test(Ozone, Month) pairwise.cor.test(Ozone, Month, p.adj = "bonf") detach()
attach(airquality) Month <- factor(Month, labels = month.abb[5:9]) pairwise.cor.test(Ozone, Month) pairwise.cor.test(Ozone, Month, p.adj = "bonf") detach()
Prints the histogram and corresponding density curve
panel.hist(x, col.bar = "gray", ...)
panel.hist(x, col.bar = "gray", ...)
x |
a numeric vector of x values |
col.bar |
the color of the bars |
... |
options passed to hist |
This function prints a combined histogram and density curve for use with the pairs function
Claus Ekstrom [email protected]
Ekstrom, CT (2011) The R Primer.
pairs(~ Ozone + Temp + Wind + Solar.R, data=airquality, lower.panel=panel.smooth, diag.panel=panel.hist, upper.panel=panel.r2)
pairs(~ Ozone + Temp + Wind + Solar.R, data=airquality, lower.panel=panel.smooth, diag.panel=panel.hist, upper.panel=panel.r2)
Prints the R2 with text size depending on the size of R2
panel.r2(x, y, digits = 2, cex.cor, ...)
panel.r2(x, y, digits = 2, cex.cor, ...)
x |
a numeric vector of x values |
y |
a numeric vector of y values |
digits |
a numeric value giving the number of digits to present |
cex.cor |
scaling fator for the size of text |
... |
extra options (not used at the moment) |
This function is a slight modification of the panel.cor function defined on the pairs help page. It calculated and prints the squared correlation, R2, with text size depending on the proportion of explained variation.
Claus Ekstrom [email protected]
Ekstrom, CT (2011) The R Primer.
pairs(~ Ozone + Temp + Wind + Solar.R, data=airquality, lower.panel=panel.smooth, upper.panel=panel.r2)
pairs(~ Ozone + Temp + Wind + Solar.R, data=airquality, lower.panel=panel.smooth, upper.panel=panel.r2)
Damage scores (ordinal scale) for Picea Sitchesis shoots at two dates, at four temperatures, and 4 ozone Levels
An artificial data frame with 18 observations in each of three groups.
a character vector giving the date
temperature in degrees Celcius
Ozone concentration at 4 different levels
the damage score from 0-4, higher is more damage
The number of occurrences of this group
P.W. Lucas, D.A. Cottam, L.J. Sheppard, B.J. Francis (1988). "Growth Responses and Delayed Winter Hardening in Sitka Spruce Following Summer Exposure to Ozone," New Phytologist, Vol. 108, pp. 495-504.
data(picea)
data(picea)
Fast computation of several simple linear regression, where the outcome is analyzed with several marginal analyses, or where several outcome are analyzed separately, or a combination of both.
plr(y, x, addintercept = TRUE) ## S3 method for class 'numeric' plr(y, x, addintercept = TRUE) ## S3 method for class 'matrix' plr(y, x, addintercept = TRUE)
plr(y, x, addintercept = TRUE) ## S3 method for class 'numeric' plr(y, x, addintercept = TRUE) ## S3 method for class 'matrix' plr(y, x, addintercept = TRUE)
y |
either a vector (of length N) or a matrix (with N rows) |
x |
a matrix with N rows |
addintercept |
boolean. Should the intercept be included in the model by default (TRUE) |
a data frame (if Y is a vector) or list of data frames (if Y is a matrix)
Claus Ekstrom [email protected]
mfastLmCpp
N <- 1000 # Number of observations Nx <- 20 # Number of independent variables Ny <- 80 # Number of dependent variables # Simulate outcomes that are all standard Gaussians Y <- matrix(rnorm(N*Ny), ncol=Ny) X <- matrix(rnorm(N*Nx), ncol=Nx) plr(Y, X)
N <- 1000 # Number of observations Nx <- 20 # Number of independent variables Ny <- 80 # Number of dependent variables # Simulate outcomes that are all standard Gaussians Y <- matrix(rnorm(N*Ny), ncol=Ny) X <- matrix(rnorm(N*Nx), ncol=Nx) plr(Y, X)
Compute power of test, or determine parameters to obtain target power.
power_binom_test( n = NULL, p0 = NULL, pa = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided", "less", "greater") )
power_binom_test( n = NULL, p0 = NULL, pa = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided", "less", "greater") )
n |
Number of observations |
p0 |
Probability under the null |
pa |
Probability under the alternative |
sig.level |
Significance level (Type I error probability) |
power |
Power of test (1 minus Type II error probability) |
alternative |
One- or two-sided test |
The procedure uses uniroot to find the root of a discontinuous function so some errors may pop up due to the given setup that causes the root-finding procedure to fail. Also, since exact binomial tests are used we have discontinuities in the function that we use to find the root of but despite this the function is usually quite stable.
Object of class power.htest
, a list of the arguments
(including the computed one) augmented with method and note elements.
Claus Ekstrom [email protected]
power_binom_test(n = 50, p0 = .50, pa = .75) ## => power = 0.971 power_binom_test(p0 = .50, pa = .75, power = .90) ## => n = 41 power_binom_test(n = 50, p0 = .25, power = .90, alternative="less") ## => pa = 0.0954
power_binom_test(n = 50, p0 = .50, pa = .75) ## => power = 0.971 power_binom_test(p0 = .50, pa = .75, power = .90) ## => n = 41 power_binom_test(n = 50, p0 = .25, power = .90, alternative="less") ## => pa = 0.0954
Compute power of test, or determine parameters to obtain target power for matched case-control studies.
power_mcnemar_test( n = NULL, paid = NULL, psi = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided", "one.sided"), method = c("normal", "exact", "cond.exact") )
power_mcnemar_test( n = NULL, paid = NULL, psi = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided", "one.sided"), method = c("normal", "exact", "cond.exact") )
n |
Number of observations (number of pairs) |
paid |
The probability that a case patient is not exposed and that the corresponding control patient was exposed (specifying p_12 in the 2 x 2 table). It is assumed that this is the _smaller_ of the two discordant probabilities. |
psi |
The relative probability that a control patient is not exposed and that the corresponding case patient was exposed compared to the probability that a case patient is not exposed and that the corresponding control patient was exposed (i.e., p_21 / p_12 in the 2x2 table). Also called the discordant proportion ratio. psi must be larger than or equal to 1 since paid was the smaller of the two discordant probabilities. |
sig.level |
Significance level (Type I error probability) |
power |
Power of test (1 minus Type II error probability) |
alternative |
One- or two-sided test |
method |
Power calculations based on exact or asymptotic test. The default (normal) corresponds to an approximative test, "exact" is the unconditional exact test, while "cond.exact" is a conditional exact test (given fixed n). The "exact" method is very slow for large values of n so it is most useful for fixed (and moderately-sized) n. |
Object of class power.htest
, a list of the arguments
(including the computed one) augmented with method and note elements.
uniroot
is used to solve power equation for unknowns, so you
may see errors from it, notably about inability to bracket the root when
invalid arguments are given.
Claus Ekstrom [email protected]
Duffy, S (1984). Asymptotic and Exact Power for the McNemar Test and its Analogue with R Controls per Case
Fagerland MW, Lydersen S, Laake P. (2013) The McNemar test for binary matched-pairs data: mid-p and asymptotic are better than exact conditional. BMC Medical Research Methodology.
# Assume that pi_12 is 0.125 and we wish to detect an OR of 2. # This implies that pi_12=0.25, and with alpha=0.05, and a power of 90% you get power_mcnemar_test(n=NULL, paid=.125, psi=2, power=.9) power_mcnemar_test(n=NULL, paid=.1, psi=2, power=.8, method="normal") power_mcnemar_test(n=NULL, paid=.1, psi=2, power=.8)
# Assume that pi_12 is 0.125 and we wish to detect an OR of 2. # This implies that pi_12=0.25, and with alpha=0.05, and a power of 90% you get power_mcnemar_test(n=NULL, paid=.125, psi=2, power=.9) power_mcnemar_test(n=NULL, paid=.1, psi=2, power=.8, method="normal") power_mcnemar_test(n=NULL, paid=.1, psi=2, power=.8)
Compute power of test, or determine parameters to obtain target power for equal and unequal sample sizes.
power_prop_test( n = NULL, p1 = NULL, p2 = NULL, sig.level = 0.05, power = NULL, ratio = 1, alternative = c("two.sided", "one.sided"), tol = .Machine$double.eps^0.25 )
power_prop_test( n = NULL, p1 = NULL, p2 = NULL, sig.level = 0.05, power = NULL, ratio = 1, alternative = c("two.sided", "one.sided"), tol = .Machine$double.eps^0.25 )
n |
Number of observations (in group 1) |
p1 |
Probability in one group |
p2 |
Probability in other group |
sig.level |
Significance level (Type I error probability) |
power |
Power of test (1 minus Type II error probability) |
ratio |
The ratio n2/n1 between the larger group and the smaller group. Should be a value equal to or greater than 1 since n2 is the larger group. Defaults to 1 (equal group sizes) |
alternative |
String. Can be one- or two-sided test. Can be abbreviated. |
tol |
Numerical tolerance used in root finding, the default providing (at least) four significant digits |
Exactly one of the parameters n
, delta
, power
, sd
, sig.level
, ratio
sd.ratio
must be passed as NULL, and that parameter is determined from the others. Notice that the last two have non-NULL defaults
so NULL must be explicitly passed if you want to compute them.
Object of class power.htest
, a list of the arguments (including the computed one) augmented with method
and note
elements.
uniroot
is used to solve power equation for unknowns, so you may
see errors from it, notably about inability to bracket the root
when invalid arguments are given.
Claus Ekstrom [email protected]
power.prop.test
, power_t_test
, power.t.test
power_prop_test(n=NULL, p1=.65, p2=.85, power=.8, ratio=2)
power_prop_test(n=NULL, p1=.65, p2=.85, power=.8, ratio=2)
Compute power of test, or determine parameters to obtain target power for equal and unequal sample sizes.
power_t_test( n = NULL, delta = NULL, sd = 1, sig.level = 0.05, power = NULL, ratio = 1, sd.ratio = 1, type = c("two.sample", "one.sample", "paired"), alternative = c("two.sided", "one.sided"), df.method = c("welch", "classical"), strict = TRUE )
power_t_test( n = NULL, delta = NULL, sd = 1, sig.level = 0.05, power = NULL, ratio = 1, sd.ratio = 1, type = c("two.sample", "one.sample", "paired"), alternative = c("two.sided", "one.sided"), df.method = c("welch", "classical"), strict = TRUE )
n |
Number of observations (in the smallest group if two groups) |
delta |
True difference in means |
sd |
Standard deviation |
sig.level |
Significance level (Type I error probability) |
power |
Power of test (1 minus Type II error probability) |
ratio |
The ratio n2/n1 between the larger group and the smaller group. Should be a value equal to or greater than 1 since n2 is the larger group. Defaults to 1 (equal group sizes). If ratio is set to NULL (i.e., find the ratio) then the ratio might be smaller than 1 depending on the desired power and ratio of the sd's. |
sd.ratio |
The ratio sd2/sd1 between the standard deviations in the larger group and the smaller group. Defaults to 1 (equal standard deviations in the two groups) |
type |
Type of t test |
alternative |
One- or two-sided test |
df.method |
Method for calculating the degrees of default. Possibilities are welch (the default) or classical. |
strict |
Use strict interpretation in two-sided case. Defaults to TRUE unlike the standard power.t.test function. |
Exactly one of the parameters n
, delta
, power
, sd
, sig.level
, ratio
sd.ratio
must be passed as NULL,
and that parameter is determined from the others. Notice that the last two have non-NULL defaults
so NULL must be explicitly passed if you want to compute them.
The default strict = TRUE
ensures that the power will include the probability
of rejection in the opposite direction of the true effect, in the
two-sided case. Without this the power will be half the
significance level if the true difference is zero.
Object of class power.htest
, a list of the arguments (including the computed one)
augmented with method
and note
elements.
uniroot
is used to solve power equation for unknowns, so you may
see errors from it, notably about inability to bracket the root
when invalid arguments are given.
Claus Ekstrom [email protected]
power.t.test
, power_prop_test
, power.prop.test
# Sampling with a ratio of 1:4 power_t_test(delta=300, sd=450, power=.8, ratio=4) # Equal group sizes but different sd's # The sd in the second group is twice the sd in the second group power_t_test(delta=300, sd=450, power=.8, sd.ratio=2) # Fixed group one size to 50 individuals, but looking for the number of individuals in the # second group. Different sd's with twice the sd in the larger group power_t_test(n=50, delta=300, sd=450, power=.8, ratio=NULL, sd.ratio=2)
# Sampling with a ratio of 1:4 power_t_test(delta=300, sd=450, power=.8, ratio=4) # Equal group sizes but different sd's # The sd in the second group is twice the sd in the second group power_t_test(delta=300, sd=450, power=.8, sd.ratio=2) # Fixed group one size to 50 individuals, but looking for the number of individuals in the # second group. Different sd's with twice the sd in the larger group power_t_test(n=50, delta=300, sd=450, power=.8, ratio=NULL, sd.ratio=2)
In a typical pretest-posttest RCT, subjects are randomized to two treatments, and response is measured at baseline, prior to intervention with the randomized treatment (pretest), and at prespecified follow-up time (posttest). Interest focuses on the effect of treatments on the change between mean baseline and follow-up response. Missing posttest response for some subjects is routine, and disregarding missing cases can lead to invalid inference.
prepost.test(baseline, post, treatment, conf.level = 0.95, delta = "estimate")
prepost.test(baseline, post, treatment, conf.level = 0.95, delta = "estimate")
baseline |
A vector of quantitative baseline measurements |
post |
A vector of quantitative post-test measurements with same length as baseline. May contain missing values |
treatment |
A vector of 0s and 1s corresponding to treatment indicator. 1 = treated, Same length as baseline |
conf.level |
confidence level of the interval |
delta |
A numeric between 0 and 1 OR the string "estimate" (the default). The proportion of observation treated. |
Claus Ekstrom [email protected]
Marie Davidian, Anastasios A. Tsiatis and Selene Leon (2005). "Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data". Statistical Science 20, 261-301.
# From Altman expo = c(rep(1,9),rep(0,7)) bp1w = c(137,120,141,137,140,144,134,123,142,139,134,136,151,147,137,149) bp_base = c(147,129,158,164,134,155,151,141,153,133,129,152,161,154,141,156) diff = bp1w-bp_base prepost.test(bp_base, bp1w, expo)
# From Altman expo = c(rep(1,9),rep(0,7)) bp1w = c(137,120,141,137,140,144,134,123,142,139,134,136,151,147,137,149) bp_base = c(147,129,158,164,134,155,151,141,153,133,129,152,161,154,141,156) diff = bp1w-bp_base prepost.test(bp_base, bp1w, expo)
Fast extraction of matrix diagonal
qdiag(x)
qdiag(x)
x |
The matrix to extract the diagonal from |
Note this function can only be used for extraction
A vector with the diagonal elements
Claus Ekstrom <[email protected]>
Function for calculating the quasi-likelihood under the independence model information criterion (QIC), quasi-likelihood, correlation information criterion (CIC), and corrected QIC for one or several fitted geeglm model object from the geepack package.
## S3 method for class 'geeglm' QIC(object, tol = .Machine$double.eps, ...) ## S3 method for class 'ordgee' QIC(object, tol = .Machine$double.eps, ...) ## S3 method for class 'geekin' QIC(object, tol = .Machine$double.eps, ...) QIC(object, tol = .Machine$double.eps, ...)
## S3 method for class 'geeglm' QIC(object, tol = .Machine$double.eps, ...) ## S3 method for class 'ordgee' QIC(object, tol = .Machine$double.eps, ...) ## S3 method for class 'geekin' QIC(object, tol = .Machine$double.eps, ...) QIC(object, tol = .Machine$double.eps, ...)
object |
a fitted GEE model from the geepack package. Currently only works on geeglm objects |
tol |
the tolerance used for matrix inversion |
... |
optionally more fitted geeglm model objects |
QIC is used to select a correlation structure. The QICu is used to compare models that have the same working correlation matrix and the same quasi-likelihood form but different mean specifications. CIC has been suggested as a more robust alternative to QIC when the model for the mean may not fit the data very well and when models with different correlation structures are compared.
Models with smaller values of QIC, CIC, QICu, or QICC are preferred.
If the MASS package is loaded then the ginv
function is used
for matrix inversion. Otherwise the standard solve
function is
used.
A vector or matrix with the QIC, QICu, quasi likelihood, CIC, the number of mean effect parameters, and the corrected QIC for each GEE object
Claus Ekstrom [email protected], Brian McLoone [email protected], and Steven Orzack [email protected]
Pan, W. (2001). Akaike's information criterion in
generalized estimating equations. Biometrics, 57, 120-125.
Hardin, J.W.
and Hilbe, J.M. (2012). Generalized Estimating Equations, 2nd
Edition, Chapman and Hall/CRC: New York.
Hin, L.-Y. and Wang, Y-G.
(2009). Working-correlation-structure identification in generalized
estimating equations, Statistics in Medicine 28: 642-658.
Thall, P.F.
and Vail, S.C. (1990). Some Covariance Models for Longitudinal Count
Data with Overdispersion. Biometrics, 46, 657-671.
geeglm
library(geepack) data(ohio) fit <- geeglm(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="exch", scale.fix=TRUE) QIC(fit)
library(geepack) data(ohio) fit <- geeglm(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="exch", scale.fix=TRUE) QIC(fit)
Gene expression levels from real-time quantitative polymerase chain reaction (qPCR) experiments on two different plant lines. Each line was used for 7 experiments each with 45 cycles.
A data frame with 630 observations on the following 4 variables.
flour |
numeric | Fluorescence level |
line |
factor | Plant lines rnt (mutant) and wt
(wildtype) |
cycle |
numeric | Cycle number for the experiment |
transcript
|
factor | Transcript used for the different runs |
Data provided by Kirsten Jorgensen <[email protected]>.
Added by Claus Ekstrom <[email protected]>
Morant, M. et al. (2010). Metabolomic, Transcriptional, Hormonal and Signaling Cross-Talk in Superroot2. Molecular Plant. 3, p.192–211.
data(qpcr) # # Analyze a single run for the wt line, transcript 1 # run1 <- subset(qpcr, transcript==1 & line=="wt") model <- nls(flour ~ fmax/(1+exp(-(cycle-c)/b))+fb, start=list(c=25, b=1, fmax=100, fb=0), data=run1) print(model) plot(run1$cycle, run1$flour, xlab="Cycle", ylab="Fluorescence") lines(run1$cycle, predict(model))
data(qpcr) # # Analyze a single run for the wt line, transcript 1 # run1 <- subset(qpcr, transcript==1 & line=="wt") model <- nls(flour ~ fmax/(1+exp(-(cycle-c)/b))+fb, start=list(c=25, b=1, fmax=100, fb=0), data=run1) print(model) plot(run1$cycle, run1$flour, xlab="Cycle", ylab="Fluorescence") lines(run1$cycle, predict(model))
Fast computation of a quadratic form .
quadform(x, M, invertM = FALSE, transposex = FALSE)
quadform(x, M, invertM = FALSE, transposex = FALSE)
x |
A matrix with dimensions n*k. |
M |
A matrix with dimenions n*n. If it is to be inverted then the matrix should be symmetric and positive difinite (no check is done for this) |
invertM |
A logical. If set to TRUE then M will be inverted before computations (defaults to FALSE) |
transposex |
A logical. Should the matrix be transposed before computations (defaults to FALSE). |
A matrix with dimensions k * k giving the quadratic form
Claus Ekstrom <[email protected]>
Five raters were asked to guess the number of points in a swarm for 10 different figures (which - unknown to the raters - were each repeated three times).
A data frame with 30 observations on the following 6 variables.
The true number of points in the swarm. Each picture is replicated thrice
Ratings from judge 1
Ratings from judge 2
Ratings from judge 3
Ratings from judge 4
Ratings from judge 5
The raters har approximately 10 seconds to judge each picture, and the thought it was 30 different pictures. Before starting the experiment they were shown 6 (unrelated) pictures and were told the number of points in each of those pictures. The SAND column contains the picture id and the true number of points in the swarm.
Collected by Claus Ekstrom.
data(rainman) long <- data.frame(stack(rainman[,2:6]), figure=factor(rep(rainman$SAND,5))) figind <- interaction(long$figure,long$ind) # Use a linear random effect model from the # lme4 package if available if(require(lme4)) { model <- lmer(values ~ (1|ind) + (1|figure) + (1|figind), data=long) } # # Point swarms were generated by the following program # ## Not run: set.seed(2) # Original npoints <- sample(4:30)*4 nplots <- 10 pdf(file="swarms.pdf", onefile=TRUE) s1 <- sample(npoints[1:nplots]) print(s1) for (i in 1:nplots) { n <- s1[i] set.seed(n) x <- runif(n) y <- runif(n) plot(x,y, xlim=c(-.15, 1.15), ylim=c(-.15, 1.15), pch=20, axes=FALSE, xlab="", ylab="") } s1 <- sample(npoints[1:nplots]) print(s1) for (i in 1:nplots) { n <- s1[i] set.seed(n) x <- runif(n) y <- runif(n) plot(y,x, xlim=c(-.15, 1.15), ylim=c(-.15, 1.15), pch=20, axes=FALSE, xlab="", ylab="") } s1 <- sample(npoints[1:nplots]) print(s1) for (i in 1:nplots) { n <- s1[i] set.seed(n) x <- runif(n) y <- runif(n) plot(-x,y, xlim=c(-1.15, .15), ylim=c(-.15, 1.15), pch=20, axes=FALSE, xlab="", ylab="") } dev.off() ## End(Not run)
data(rainman) long <- data.frame(stack(rainman[,2:6]), figure=factor(rep(rainman$SAND,5))) figind <- interaction(long$figure,long$ind) # Use a linear random effect model from the # lme4 package if available if(require(lme4)) { model <- lmer(values ~ (1|ind) + (1|figure) + (1|figind), data=long) } # # Point swarms were generated by the following program # ## Not run: set.seed(2) # Original npoints <- sample(4:30)*4 nplots <- 10 pdf(file="swarms.pdf", onefile=TRUE) s1 <- sample(npoints[1:nplots]) print(s1) for (i in 1:nplots) { n <- s1[i] set.seed(n) x <- runif(n) y <- runif(n) plot(x,y, xlim=c(-.15, 1.15), ylim=c(-.15, 1.15), pch=20, axes=FALSE, xlab="", ylab="") } s1 <- sample(npoints[1:nplots]) print(s1) for (i in 1:nplots) { n <- s1[i] set.seed(n) x <- runif(n) y <- runif(n) plot(y,x, xlim=c(-.15, 1.15), ylim=c(-.15, 1.15), pch=20, axes=FALSE, xlab="", ylab="") } s1 <- sample(npoints[1:nplots]) print(s1) for (i in 1:nplots) { n <- s1[i] set.seed(n) x <- runif(n) y <- runif(n) plot(-x,y, xlim=c(-1.15, .15), ylim=c(-.15, 1.15), pch=20, axes=FALSE, xlab="", ylab="") } dev.off() ## End(Not run)
Fast generation of a matrix by replicating a matrix row- and column-wise in a block-like fashion
repmat(x, nrow = 1L, ncol = 1L)
repmat(x, nrow = 1L, ncol = 1L)
x |
A matrix with dimensions r*c. |
nrow |
An integer giving the number of times the matrix is replicated row-wise |
ncol |
An integer giving the number of times the matrix is replicated column-wise |
A matrix with dimensions (r*nrow) x (c*ncol)
Claus Ekstrom <[email protected]>
m <- matrix(1:6, ncol=3) repmat(m, 2) # Stack two copies of m on top of each other repmat(m, 2, 3) # Replicate m with two copies on top and three copies side-by-side
m <- matrix(1:6, ncol=3) repmat(m, 2) # Stack two copies of m on top of each other repmat(m, 2, 3) # Replicate m with two copies on top and three copies side-by-side
Plots a standardized residual plot from an lm or glm object and provides additional graphics to help evaluate the variance homogeneity and mean.
residual_plot( x, y = NULL, candy = TRUE, bandwidth = 0.3, xlab = "Fitted values", ylab = "Std.res.", col.sd = "blue", alpha = 0.1, ylim = NA, ... ) ## Default S3 method: residual_plot( x, y = NULL, candy = TRUE, bandwidth = 0.3, xlab = "Fitted values", ylab = "Std.res.", col.sd = "blue", alpha = 0.1, ylim = NA, ... ) ## S3 method for class 'lm' residual_plot( x, y, candy = TRUE, bandwidth = 0.3, xlab = "Fitted values", ylab = "Stud.res.", col.sd = "blue", alpha = 0.1, ... ) ## S3 method for class 'glm' residual_plot( x, y, candy = TRUE, bandwidth = 0.4, xlab = "Fitted values", ylab = "Std. dev. res.", col.sd = "blue", alpha = 0.1, ... )
residual_plot( x, y = NULL, candy = TRUE, bandwidth = 0.3, xlab = "Fitted values", ylab = "Std.res.", col.sd = "blue", alpha = 0.1, ylim = NA, ... ) ## Default S3 method: residual_plot( x, y = NULL, candy = TRUE, bandwidth = 0.3, xlab = "Fitted values", ylab = "Std.res.", col.sd = "blue", alpha = 0.1, ylim = NA, ... ) ## S3 method for class 'lm' residual_plot( x, y, candy = TRUE, bandwidth = 0.3, xlab = "Fitted values", ylab = "Stud.res.", col.sd = "blue", alpha = 0.1, ... ) ## S3 method for class 'glm' residual_plot( x, y, candy = TRUE, bandwidth = 0.4, xlab = "Fitted values", ylab = "Std. dev. res.", col.sd = "blue", alpha = 0.1, ... )
x |
lm object or a numeric vector |
y |
numeric vector for the y axis values |
candy |
logical. Should a lowess curve and local standard deviation of
the residual be added to the plot. Defaults to |
bandwidth |
The width of the window used to calculate the local smoothed version of the mean and the variance. Value should be between 0 and 1 and determines the percentage of the window width used |
xlab |
x axis label |
ylab |
y axis label |
col.sd |
color for the background residual deviation |
alpha |
number between 0 and 1 determining the transprency of the standard deviation plotting color |
ylim |
pair of observations that set the minimum and maximum of the y axis. If set to NA (the default) then the limits are computed from the data. |
... |
Other arguments passed to the plot function |
The y axis shows the studentized residuals (for lm objects) or standardized deviance residuals (for glm objects). The x axis shows the linear predictor, i.e., the predicted values for lm objects.
The blue area is a smoothed estimate of 1.96*SD of the standardized residuals in a window around the predicted value. The blue area should largely be rectangular if the standardized residuals have more or less the same variance.
The dashed line shows the smoothed mean of the standardized residuals and should generally follow the horizontal line through (0,0).
Solid circles correspond to standardized residuals outside the range from [-1.96; 1.96] while open circles are inside that interval. Roughly 5
Produces a standardized residual plot
Claus Ekstrom <[email protected]>
# Linear regression example data(trees) model <- lm(Volume ~ Girth + Height, data=trees) residual_plot(model) model2 <- lm(Volume ~ Girth + I(Girth^2) + Height, data=trees) residual_plot(model2) # Add extra information about points by adding geom_text to the object produced m <- lm(mpg ~ hp + factor(vs), data=mtcars) residual_plot(m) + ggplot2::geom_point(ggplot2::aes(color=factor(cyl)), data=mtcars)
# Linear regression example data(trees) model <- lm(Volume ~ Girth + Height, data=trees) residual_plot(model) model2 <- lm(Volume ~ Girth + I(Girth^2) + Height, data=trees) residual_plot(model2) # Add extra information about points by adding geom_text to the object produced m <- lm(mpg ~ hp + factor(vs), data=mtcars) residual_plot(m) + ggplot2::geom_point(ggplot2::aes(color=factor(cyl)), data=mtcars)
Plots a standardized residual plot from an lm or glm object and provides additional graphics to help evaluate the variance homogeneity and mean.
## Default S3 method: residualplot( x, y = NULL, candy = TRUE, bandwidth = 0.3, xlab = "Fitted values", ylab = "Std.res.", col.sd = "blue", col.alpha = 0.3, ylim = NA, ... ) ## S3 method for class 'lm' residualplot( x, y, candy = TRUE, bandwidth = 0.3, xlab = "Fitted values", ylab = "Stud.res.", col.sd = "blue", col.alpha = 0.3, ... ) ## S3 method for class 'glm' residualplot( x, y, candy = TRUE, bandwidth = 0.4, xlab = "Fitted values", ylab = "Std. dev. res.", col.sd = "blue", col.alpha = 0.3, ... ) residualplot( x, y = NULL, candy = TRUE, bandwidth = 0.3, xlab = "Fitted values", ylab = "Std.res.", col.sd = "blue", col.alpha = 0.3, ylim = NA, ... )
## Default S3 method: residualplot( x, y = NULL, candy = TRUE, bandwidth = 0.3, xlab = "Fitted values", ylab = "Std.res.", col.sd = "blue", col.alpha = 0.3, ylim = NA, ... ) ## S3 method for class 'lm' residualplot( x, y, candy = TRUE, bandwidth = 0.3, xlab = "Fitted values", ylab = "Stud.res.", col.sd = "blue", col.alpha = 0.3, ... ) ## S3 method for class 'glm' residualplot( x, y, candy = TRUE, bandwidth = 0.4, xlab = "Fitted values", ylab = "Std. dev. res.", col.sd = "blue", col.alpha = 0.3, ... ) residualplot( x, y = NULL, candy = TRUE, bandwidth = 0.3, xlab = "Fitted values", ylab = "Std.res.", col.sd = "blue", col.alpha = 0.3, ylim = NA, ... )
x |
lm object or a numeric vector |
y |
numeric vector for the y axis values |
candy |
logical. Should a lowess curve and local standard deviation of
the residual be added to the plot. Defaults to |
bandwidth |
The width of the window used to calculate the local smoothed version of the mean and the variance. Value should be between 0 and 1 and determines the percentage of the window width used |
xlab |
x axis label |
ylab |
y axis label |
col.sd |
color for the background residual deviation |
col.alpha |
number between 0 and 1 determining the transprency of the standard deviation plotting color |
ylim |
pair of observations that set the minimum and maximum of the y axis. If set to NA (the default) then the limits are computed from the data. |
... |
Other arguments passed to the plot function |
The y axis shows the studentized residuals (for lm objects) or standardized deviance residuals (for glm objects). The x axis shows the linear predictor, i.e., the predicted values for lm objects.
The blue area is a smoothed estimate of 1.96*SD of the standardized residuals in a window around the predicted value. The blue area should largely be rectangular if the standardized residuals have more or less the same variance.
The dashed line shows the smoothed mean of the standardized residuals and should generally follow the horizontal line through (0,0).
Solid circles correspond to standardized residuals outside the range from [-1.96; 1.96] while open circles are inside that interval. Roughly 5
Produces a standardized residual plot
Claus Ekstrom <[email protected]>
# Linear regression example data(trees) model <- lm(Volume ~ Girth + Height, data=trees) residualplot(model) model2 <- lm(Volume ~ Girth + I(Girth^2) + Height, data=trees) residualplot(model2)
# Linear regression example data(trees) model <- lm(Volume ~ Girth + Height, data=trees) residualplot(model) model2 <- lm(Volume ~ Girth + I(Girth^2) + Height, data=trees) residualplot(model2)
Simulates residual multivariate t-distributed response data from a pedigree where the additive genetic, dominance genetic, and shared environmental effects are taken into account.
rmvt.pedigree(n = 1, pedigree, h2 = 0, c2 = 0, d2 = 0, df = 1)
rmvt.pedigree(n = 1, pedigree, h2 = 0, c2 = 0, d2 = 0, df = 1)
n |
numeric. The number of simulations to generate |
pedigree |
a |
h2 |
numeric. The heritability |
c2 |
numeric. The environmentability |
d2 |
numeric. The dominance deviance effect |
df |
numeric. The degrees of freedom for the t distribution |
The three parameters should have a sum: h2+c2+d2 that is less than 1. The total variance is set to 1, and the mean is zero.
Returns a matrix with the simulated values with n columns (one for each simulation) and each row matches the corresponding individual from the pedigree
Claus Ekstrom [email protected]
pedigree
, kinship
,
library(kinship2) library(mvtnorm) mydata <- data.frame(id=1:5, dadid=c(NA, NA, 1, 1, 1), momid=c(NA, NA, 2, 2, 2), sex=c("male", "female", "male", "male", "male"), famid=c(1,1,1,1,1)) relation <- data.frame(id1=c(3), id2=c(4), famid=c(1), code=c(1)) ped <- pedigree(id=mydata$id, dadid=mydata$dadid, momid=mydata$momid, sex=mydata$sex, relation=relation) rmvt.pedigree(2, ped, h2=.25, df=4)
library(kinship2) library(mvtnorm) mydata <- data.frame(id=1:5, dadid=c(NA, NA, 1, 1, 1), momid=c(NA, NA, 2, 2, 2), sex=c("male", "female", "male", "male", "male"), famid=c(1,1,1,1,1)) relation <- data.frame(id1=c(3), id2=c(4), famid=c(1), code=c(1)) ped <- pedigree(id=mydata$id, dadid=mydata$dadid, momid=mydata$momid, sex=mydata$sex, relation=relation) rmvt.pedigree(2, ped, h2=.25, df=4)
Simulates residual multivariate Gaussian response data from a pedigree where the additive genetic, dominance genetic, and shared environmental effects are taken into account.
rmvtnorm.pedigree(n = 1, pedigree, h2 = 0, c2 = 0, d2 = 0)
rmvtnorm.pedigree(n = 1, pedigree, h2 = 0, c2 = 0, d2 = 0)
n |
numeric. The number of simulations to generate |
pedigree |
a |
h2 |
numeric. The heritability |
c2 |
numeric. The environmentability |
d2 |
numeric. The dominance deviance effect |
The three parameters should have a sum: h2+c2+d2 that is less than 1. The total variance is set to 1, and the mean is zero.
Returns a matrix with the simulated values with n columns (one for each simulation) and each row matches the corresponding individual from the pedigree
Claus Ekstrom [email protected]
pedigree
, kinship
,
library(kinship2) library(mvtnorm) mydata <- data.frame(id=1:5, dadid=c(NA, NA, 1, 1, 1), momid=c(NA, NA, 2, 2, 2), sex=c("male", "female", "male", "male", "male"), famid=c(1,1,1,1,1)) relation <- data.frame(id1=c(3), id2=c(4), famid=c(1), code=c(1)) ped <- pedigree(id=mydata$id, dadid=mydata$dadid, momid=mydata$momid, sex=mydata$sex, relation=relation) rmvtnorm.pedigree(2, ped, h2=.25)
library(kinship2) library(mvtnorm) mydata <- data.frame(id=1:5, dadid=c(NA, NA, 1, 1, 1), momid=c(NA, NA, 2, 2, 2), sex=c("male", "female", "male", "male", "male"), famid=c(1,1,1,1,1)) relation <- data.frame(id1=c(3), id2=c(4), famid=c(1), code=c(1)) ped <- pedigree(id=mydata$id, dadid=mydata$dadid, momid=mydata$momid, sex=mydata$sex, relation=relation) rmvtnorm.pedigree(2, ped, h2=.25)
Random generation for a perfect normal distribution with mean equal to mean and standard deviation equal to sd.
rnorm_perfect(n, mean = 0, sd = 1)
rnorm_perfect(n, mean = 0, sd = 1)
n |
number of observations. If length(n) > 1, the length is taken to be the number required. |
mean |
number of mean. |
sd |
number of standard deviation. |
The function will return the same set of quantiles for fixed n. In that sense there is not much randomness going on, and the function is mostly useful for illustrative purposes.
Returns a vector of values from a perfect normal distribution
Claus Ekstrom [email protected]
rnorm_perfect(30, mean=10, sd=2)
rnorm_perfect(30, mean=10, sd=2)
Create a hanging rootogram for a quantitative numeric vector and compare it to a Gaussian distribution.
rootonorm( x, breaks = "Sturges", type = c("hanging", "deviation"), scale = c("sqrt", "raw"), zeroline = TRUE, linecol = "red", rectcol = "lightgrey", xlab = xname, ylab = "Sqrt(frequency)", yaxt = "n", ylim = NULL, mu = mean(x), s = sd(x), gap = 0.1, ... )
rootonorm( x, breaks = "Sturges", type = c("hanging", "deviation"), scale = c("sqrt", "raw"), zeroline = TRUE, linecol = "red", rectcol = "lightgrey", xlab = xname, ylab = "Sqrt(frequency)", yaxt = "n", ylim = NULL, mu = mean(x), s = sd(x), gap = 0.1, ... )
x |
a numeric vector of values for which the rootogram is desired |
breaks |
Either the character string ‘Sturges’ to use Sturges' algorithm to decide the number of breaks or a positive integer that sets the number of breaks. |
type |
if |
scale |
The type of transformation. Defaults to |
zeroline |
logical; if |
linecol |
The color of the density line for the normal distribution.
The default is to make a |
rectcol |
a colour to be used to fill the bars. The default of
|
xlab , ylab
|
plot labels. The |
yaxt |
Should y axis text be printed. Defaults to |
ylim |
the range of y values with sensible defaults. |
mu |
the mean of the Gaussian distribution. Defaults to the sample mean
of |
s |
the standard deivation of the Gaussian distribution. Defaults to
the sample std.dev. of |
gap |
The distance between the rectangles in the histogram. |
... |
further arguments and graphical parameters passed to
|
The mean and standard deviation of the Gaussian distribution are calculated
from the observed data unless the mu
and s
arguments are
given.
Returns a vector of counts of each bar. This may be changed in the future. The plot is the primary output of the function.
Claus Ekstrom [email protected]
Tukey, J. W. 1972. Some Graphic and Semigraphic Displays. In Statistical Papers in Honor of George W. Snedecor, p. 293-316.
oldpar <- par() par(mfrow=c(2,2)) rootonorm(rnorm(200)) rootonorm(rnorm(200), type="deviation", scale="raw") rootonorm(rnorm(200), mu=1) rootonorm(rexp(200), mu=1) par(oldpar)
oldpar <- par() par(mfrow=c(2,2)) rootonorm(rnorm(200)) rootonorm(rnorm(200), type="deviation", scale="raw") rootonorm(rnorm(200), mu=1) rootonorm(rexp(200), mu=1) par(oldpar)
Rounds a vector of numeric values to percentages ensuring that they add up to 100
round_percent(x, decimals = 0L, ties = c("random", "last"))
round_percent(x, decimals = 0L, ties = c("random", "last"))
x |
A numeric vector with non-negative values. |
decimals |
An integer giving the number of decimals that are used |
ties |
A string that is either 'random' (the default) or 'last'. Determines how to break ties. Random is random, last prefers to break ties at the last position |
Returns a vector of numeric values.
Returns a numeric vector of the same length as x
Claus Ekstrom [email protected]
f <- c(1,2,1,3,2,1,2,3,1) round_percent(f)
f <- c(1,2,1,3,2,1,2,3,1) round_percent(f)
Simulates a randomized treatment based on an urn model.
rud( n, alpha = c(1, 1), beta = 1, labels = seq(1, length(alpha)), data.frame = FALSE, startid = 1 )
rud( n, alpha = c(1, 1), beta = 1, labels = seq(1, length(alpha)), data.frame = FALSE, startid = 1 )
n |
the number of individuals to randomize |
alpha |
a non-negative integer vector of weights for each treatment group. The length of the vector corresponds to the number of treatment groups. |
beta |
a non-negative integer of weights added to the groups that were not given treatment |
labels |
a vector of treatment labels. Must be the same length as the length of alpha. |
data.frame |
A logical that determines if the function should return a vector of group indices (the default, if FALSE) or a data frame (if TRUE). |
startid |
margin paramaters; vector of length 4 (see |
The urn model can be described as follows: For k different treatments, the urn design is initiated with a number of balls in an urn corresponding to the start weight (the alpha argument), where each treatment has a specific colour. Whenever a patient arrives, a random ball is drawn from the urn and the colour decides the treatment for the patient. For each of the treatments that weren't chosen we add beta balls of the corresponding colour(s) to the urn to update the probabilities for the next patient.
A vector with group indices. If the argument data.frame=TRUE
is used then a data frame with three variables is returned: id, group, and treatment (the group label).
rud(5) rud(5, alpha=c(1,1,10), beta=5)
rud(5) rud(5, alpha=c(1,1,10), beta=5)
Internal functions for the MESS package
scorefct(o, beta = NULL, testidx = NULL, sas = FALSE)
scorefct(o, beta = NULL, testidx = NULL, sas = FALSE)
o |
input geepack object from a geeglm fit. |
beta |
The estimated parameters. If set to |
testidx |
Indices of the beta parameters that should be tested equal to zero |
sas |
Logical. Should the SAS version of the score test be computed. Defaults to |
Claus Ekstrom [email protected]
Expands a contingency table to a data frame where each observation in the table becomes a single observation in the data frame with corresponding information for each for each combination of the table dimensions.
screen_variables(x, y, lambda = 0.1, method = c("global-strong", "global-DPP"))
screen_variables(x, y, lambda = 0.1, method = c("global-strong", "global-DPP"))
x |
A table or matrix |
y |
A vector of outcomes |
lambda |
a vector of positive values used for the penalization parameter. |
method |
a string giving the method used for screening. Two possibilities are "global-strong" and "global-DPP" |
Note that no standardization is done (not necessary?)
A list with three elements: lambda which contains the lambda values, selected which contains the indices of the selected variables, and method a string listing the method used.
Claus Ekstrom [email protected]
Hastie, Tibshirani and Wainwright (2015). "Statistical Learning with Sparsity". CRC Press.
x <- matrix(rnorm(50*100), nrow=50) y <- rnorm(50, mean=x[,1]) screen_variables(x, y, lambda=c(.1, 1, 2))
x <- matrix(rnorm(50*100), nrow=50) y <- rnorm(50, mean=x[,1]) screen_variables(x, y, lambda=c(.1, 1, 2))
Segregate di-allelic genes down through the generations of a pedigree. It is assumed that the founders are independent and that the genes are in Hardy Weinberg equilibrium in the population.
segregate.genes(pedigree, maf)
segregate.genes(pedigree, maf)
pedigree |
a |
maf |
a vector of minor allele frequencies for each diallelic gene to segregate through the pedigree |
Returns a data frame. Each row matches the order of the individuals in the pedigree and each column corresponds to each of the segregated genes. The data frame contains values 0, 1, or 2 corresponding to the number of copies of the minor allele frequency allele that person has.
Claus Ekstrom [email protected]
pedigree
, kinship
,
library(kinship2) mydata <- data.frame(id=1:5, dadid=c(NA, NA, 1, 1, 1), momid=c(NA, NA, 2, 2, 2), sex=c("male", "female", "male", "male", "male"), famid=c(1,1,1,1,1)) relation <- data.frame(id1=c(3), id2=c(4), famid=c(1), code=c(1)) ped <- pedigree(id=mydata$id, dadid=mydata$dadid, momid=mydata$momid, sex=mydata$sex, relation=relation) segregate.genes(ped, c(.1, .3, .5))
library(kinship2) mydata <- data.frame(id=1:5, dadid=c(NA, NA, 1, 1, 1), momid=c(NA, NA, 2, 2, 2), sex=c("male", "female", "male", "male", "male"), famid=c(1,1,1,1,1)) relation <- data.frame(id1=c(3), id2=c(4), famid=c(1), code=c(1)) ped <- pedigree(id=mydata$id, dadid=mydata$dadid, momid=mydata$momid, sex=mydata$sex, relation=relation) segregate.genes(ped, c(.1, .3, .5))
Inverts a symmetric positive-definite matrix without requiring the Matrix package.
sinv(obj)
sinv(obj)
obj |
The symmetric positive-definite matrix |
This function does no error checking and it is up to the user to ensure that the input is indeed symmetric, positive-definite, and a matrix.
A matrix of the same size as the input object
Claus Ekstrom, [email protected].
m <- matrix(c(1, 0, .5, .5, 0, 1, .5, .5, .5, .5, 1, .5, .5, .5, .5, 1), 4) sinv(m)
m <- matrix(c(1, 0, .5, .5, 0, 1, .5, .5, .5, .5, 1, .5, .5, .5, .5, 1), 4) sinv(m)
Effect of smoking at 45 years of age on self reported health five years later. Data are on a sample of males from the Glostrup survey.
A table with daily smoking categories for the rows and self reported health five years later as the columns.
Data example found on the internet but originates from Svend Kreiner
data(smokehealth) m <- smokehealth m[,3] <- m[,3]+ m[,4] m[4,] <- m[4,] + m[5,] m <- m[1:4,1:3] gkgamma(m) chisq.test(m)
data(smokehealth) m <- smokehealth m[,3] <- m[,3]+ m[,4] m[4,] <- m[4,] + m[5,] m <- m[1:4,1:3] gkgamma(m) chisq.test(m)
Players on the Danish national soccer team. The dataset consists of all players who have been picked to play on the men's senior A-team, their position, date-of-birth, goals and matches.
A data frame with 805 observations on the following 5 variables.
a factor with names of the players
a Date. The date-of-birth of the player
a factor with levels Forward
Defender
Midfielder
Goalkeeper
a numeric vector. The number of A matches played by the player
a numeric vector. The number of goals scored by the player in A matches
Data collected from the player database of DBU on March 21st, 2014. See https://www.dbu.dk for more information.
data(soccer) birthmonth <- as.numeric(format(soccer$DoB, "%m")) birthyear <- as.numeric(format(soccer$DoB, "%Y"))
data(soccer) birthmonth <- as.numeric(format(soccer$DoB, "%m")) birthyear <- as.numeric(format(soccer$DoB, "%Y"))
Gene expression levels from two-color dye-swap experiment on 6 microarrays. Arrays 1 and 2 represent the first biological sample (ie, the first dye swap), 3 and 4 the second, and arrays 5 and 6 the third.
A data frame with 258000 observations on the following 5 variables.
a factor with levels green
red
representing the dye used for the gene expression
a
factor with levels 1
2
3
4
5
6
corresponding to the 6 arrays
a factor with 21500 levels representing the genes on the arrays
a factor
with levels rnt
wt
for the two types of plants: runts and wild
type
a numeric vector with the gene expression level (normalized but not log transformed)
Data provided by Soren Bak <[email protected]>.
Added by Claus
Ekstrom <[email protected]>
Morant, M. et al. (2010). Metabolomic, Transcriptional, Hormonal and Signaling Cross-Talk in Superroot2. Molecular Plant. 3, p.192–211.
data(superroot2) # Select one gene g1 <- superroot2[superroot2$gene=="AT2G24000.1",] model <- lm(log(signal) ~ plant + color + array, data=g1) summary(model)
data(superroot2) # Select one gene g1 <- superroot2[superroot2$gene=="AT2G24000.1",] model <- lm(log(signal) ~ plant + color + array, data=g1) summary(model)
Fast computation of the trace of the matrix product trace(t(A)
tracemp(A, B)
tracemp(A, B)
A |
A matrix with dimensions n*k. |
B |
A matrix with dimenions n*k. |
The trace of the matrix product
Claus Ekstrom <[email protected]>
A <- matrix(1:12, ncol=3) tracemp(A, A)
A <- matrix(1:12, ncol=3) tracemp(A, A)
This function computes the unbiased standard deviation of the values in x. If na.rm
is TRUE then missing values are removed before computation proceeds.
usd(x, na.rm = FALSE)
usd(x, na.rm = FALSE)
x |
a numeric vector or an R object but not a factor coercible to numeric by as.double(x) |
na.rm |
logical. Should missing values be removed? |
Like var this uses denominator n - 1. The standard deviation of a length-one or zero-length vector is NA.
A scalar
sd(1:5) usd(1:5)
sd(1:5) usd(1:5)
Produces a 3x3 grid of residual- or qq-plots plots from a lm object. One of the nine subfigures is the true residual plot/qqplot while the remaining are plots that fulfill the assumptions of the linear model
## Default S3 method: wallyplot( x, y = x, FUN = residualplot, hide = TRUE, simulateFunction = rnorm, model = NULL, ... ) ## S3 method for class 'lm' wallyplot( x, y = x, FUN = residualplot, hide = TRUE, simulateFunction = lmsimresiduals, ... ) wallyplot( x, y = x, FUN = residualplot, hide = TRUE, simulateFunction = rnorm, ... )
## Default S3 method: wallyplot( x, y = x, FUN = residualplot, hide = TRUE, simulateFunction = rnorm, model = NULL, ... ) ## S3 method for class 'lm' wallyplot( x, y = x, FUN = residualplot, hide = TRUE, simulateFunction = lmsimresiduals, ... ) wallyplot( x, y = x, FUN = residualplot, hide = TRUE, simulateFunction = rnorm, ... )
x |
a numeric vector of x values, or an lm object. |
y |
a numeric vector of y values of the same length as x or a n * 9 matrix of y values - one column for each of the nine plots to make. The first column is the one corresponding to the results from the dataset |
FUN |
a function that accepts an |
hide |
logical; if |
simulateFunction |
The function used to produce y values under the null hypothesis. Defaults to rnorm |
model |
Optional input to simulateFunction |
... |
Other arguments passed to the plot function |
Users who look at residual plots or qqnorm plots for the first time often feel they lack the experience to determine if the residual plot is okay or if the model assumptions are indeed violated. One way to convey "experience" is to plot a series of graphical model validation plots simulated under the model assumption together with the corresponding plot from the real data and see if the user can pinpoint one of them that looks like an odd-one-out. If the proper plot from the real data does not stand out then the assumptions are not likely to be violated.
The Wallyplot produces a 3x3 grid of plots from a lm object or from a set of pairs of x and y values. One of the nine subfigures is the true plot while the remaining are plots that fulfill the assumptions of the linear model. After the user interactively hits a key the correct residual plot (correponding to the provided data) is shown.
The plotting function can be set using the FUN
argument which should
be a function that accepts x
, y
and ...
arguments and
plots the desired figure. When y
is a single vector the same length
as x
then the function simulateFunction
is used to generate
the remaining y values corresponding the situations under the null.
For a description of the features of the default residual plot see the help page for residualplot
.
Claus Ekstrom [email protected]
Ekstrom, CT (2014) Teaching 'Instant Experience' with Graphical Model Validation Techniques. Teaching Statistics (36), p 23-26
## Not run: data(trees) res <- lm(Volume ~ Height + Girth, data=trees) wallyplot(res) # Create a grid of QQ-plot figures # Define function to plot a qq plot with an identity line qqnorm.wally <- function(x, y, ...) { qqnorm(y, ...) ; abline(a=0, b=1) } wallyplot(res, FUN=qqnorm.wally, main="") # Define function to simulate components+residuals for Girth cprsimulate <- function(n) {rnorm(n)+trees$Girth} # Create the cpr plotting function cprplot <- function(x, y, ...) {plot(x, y, pch=20, ...) ; lines(lowess(x, y), lty=3)} # Create the Wallyplot wallyplot(trees$Girth, trees$Girth+rstudent(res), FUN=cprplot, simulateFunction=cprsimulate, xlab="Girth") ## End(Not run)
## Not run: data(trees) res <- lm(Volume ~ Height + Girth, data=trees) wallyplot(res) # Create a grid of QQ-plot figures # Define function to plot a qq plot with an identity line qqnorm.wally <- function(x, y, ...) { qqnorm(y, ...) ; abline(a=0, b=1) } wallyplot(res, FUN=qqnorm.wally, main="") # Define function to simulate components+residuals for Girth cprsimulate <- function(n) {rnorm(n)+trees$Girth} # Create the cpr plotting function cprplot <- function(x, y, ...) {plot(x, y, pch=20, ...) ; lines(lowess(x, y), lty=3)} # Create the Wallyplot wallyplot(trees$Girth, trees$Girth+rstudent(res), FUN=cprplot, simulateFunction=cprsimulate, xlab="Girth") ## End(Not run)
Writes the data frame to a file in the XML format.
write.xml(data, file = NULL, collapse = TRUE)
write.xml(data, file = NULL, collapse = TRUE)
data |
the data frame object to save |
file |
the file name to be written to. |
collapse |
logical. Should the output file be collapsed to make it fill less? (Defaults to TRUE) |
This function does not require the XML package to be installed to function properly.
None
Claus Ekstrom, [email protected] based on previous work by Duncan Temple Lang.
## Not run: data(trees) write.xml(trees, file="mydata.xml") ## End(Not run)
## Not run: data(trees) write.xml(trees, file="mydata.xml") ## End(Not run)