Title: | Functions and Datasets Required for ST370 Class |
---|---|
Description: | Provides functions and datasets required for the ST 370 course at North Carolina State University. |
Authors: | Dennis D. Boos, Douglas Nychka |
Maintainer: | Dennis Boos <[email protected]> |
License: | GPL (>= 2) |
Version: | 4.0 |
Built: | 2024-11-01 11:52:05 UTC |
Source: | CRAN |
Small propulsion units, called actuators, are used to maneuver space craft once they are in space. In order to control these motions accurately, the actuator needs to produce a precise amount of force. This data set represents an experiment to understand what factors effect the variability of the force produced by an actuator. The actuator is fired using compressed air, and the factors studied are the actuator used (act), the amount of pressure used (press), the length of the air supply line (line) and the nozzle type (nozzle).
A data frame with 16 observations on the following 6 variables.
actuator used (A1 or A2)
amount of pressure used (30psi or 100psi)
length of the air supply line (20ft or 40ft)
nozzle type (rightang or straight)
force produced
experimental order
bplot(actuator$force,by=actuator$act) lplot(actuator$act,actuator$force,actuator$press) anova( lm(force ~ (act+press+nozzle+line)^2, data=actuator) )
bplot(actuator$force,by=actuator$act) lplot(actuator$act,actuator$force,actuator$press) anova( lm(force ~ (act+press+nozzle+line)^2, data=actuator) )
The airplane data frame has 6 rows and 4 columns. Each data point is the distance flown by one of the of 24 airplanes randomly assigned to the four treatments described below.
A data frame with 24 observations on the following 2 variables.
: distance flown
: one of four treatment values (treat1: no weighting of airplane nose, treat2: one paper clip on the nose, treat3: two paper clips on the nose or treat4: three paper clips on the nose)
Motivated by a class experiment (but artificial).
# Make side by side boxplots of the four treatments: bplot(airplane$distance,airplane$treatment)
# Make side by side boxplots of the four treatments: bplot(airplane$distance,airplane$treatment)
This data set contains the times in seconds that it takes Alka-Seltzer tablets to dissolve in water and 7UP at two different temperatures.
A data frame with 8 observations on the following 4 variables.
: liquid (7UP or water)
: temperature (cool or warm)
: time to dissolve (in seconds)
: bloaking level for 2x2 factorial design
Density, distribution function, quantile function and random
generation for the Bernoulli distribution with parameter prob
.
dbern(x, prob, log = FALSE) pbern(q, prob, lower.tail = TRUE, log.p = FALSE) qbern(p, prob, lower.tail = TRUE, log.p = FALSE) rbern(n, prob)
dbern(x, prob, log = FALSE) pbern(q, prob, lower.tail = TRUE, log.p = FALSE) qbern(p, prob, lower.tail = TRUE, log.p = FALSE) rbern(n, prob)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
prob |
probability of success on each trial. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are
|
The Bernoulli distribution with prob
has density
for .
If an element of x
is not 0
or 1
, the result of dbern
is zero, without a warning.
is computed using Loader's algorithm, see the reference below.
The quantile is defined as the smallest value such that
, where
is the distribution function.
dbern
gives the density, pbern
gives the distribution
function, qbern
gives the quantile function and rbern
generates random deviates.
Catherine Loader (2000). Fast and Accurate Computation of Binomial Probabilities; manuscript available from http://cm.bell-labs.com/cm/ms/departments/sia/catherine/dbinom
dbinom
for the binomial (Bernoulli is a special case
of the binomial), and dpois
for the Poisson distribution.
# Compute P(X=1) for X Bernoulli(0.7) dbern(1, 0.7)
# Compute P(X=1) for X Bernoulli(0.7) dbern(1, 0.7)
Plots boxplots of several groups of data and allows for placement at different horizontal or vertical positions. It is also flexible in the input object accepting either a list or matrix.
bplot(x, by, style = "tukey", outlier = TRUE, plot = TRUE, ...)
bplot(x, by, style = "tukey", outlier = TRUE, plot = TRUE, ...)
x |
Vector, matrix, list or data frame. A vector may be divided according to the by argument. Matrices and data frames are separated by columns and lists by components. |
by |
If x is a vector, an optional vector (either character or numerical) specifying the categories to divide x into separate data sets. |
style |
Type of boxplot default is "tukey". The other choice is "quantile" where the whiskers are drawn to the 5 and 95 percentiles instead being based on the inner fences. |
outlier |
If true outliers (points beyond outer fences) will be added to the plots. |
plot |
If false just returns a list with the statistics used for plotting the box plots. |
... |
Other arguments controlling the boxplots (passed to bplot.obj) these are listed below. Other graphical plotting arguments not matched (e.g. yaxt) are used in the call to plot to set up the initial plot if add=TRUE.
|
This function was created as a complement to the usual S function for boxplots. The current function makes it possible to put the boxplots at unequal x or y positions. This is useful for visually grouping a large set of boxplots into several groups. Also placement of the boxplots with respect to the axis can add information to the plot. Another aspect is the emphasis on data structures for groups of data. One useful feature is the by option to break up the x vector into distinct groups. If 5 or less observations are in a group the points themselves are plotted instead of a box.
The function is broken into two steps: a call to stats.bplot to find the statistics and a call to bplot.obj to plot the resulting object. The user is referred to describe.bplot to modify the statistics used and to draw.bplot.obj to modify how the bplot is drawn.
Finally to bin data into groups based on a continuous variable and to make bplots of each group see bplot.xy.
boxplot, bplot.xy, lplot, mplot, plot
# set.seed(123) temp<- matrix( rnorm(12*8), ncol=12) pos<- c(1:6,9:14) bplot(temp) # bplot( temp, pos=pos, labels=paste( "D",1:12), horizontal=TRUE) # bplot( temp, pos=pos, label.cex=0, horizontal=TRUE) # add an axis axis( 2)
# set.seed(123) temp<- matrix( rnorm(12*8), ncol=12) pos<- c(1:6,9:14) bplot(temp) # bplot( temp, pos=pos, labels=paste( "D",1:12), horizontal=TRUE) # bplot( temp, pos=pos, label.cex=0, horizontal=TRUE) # add an axis axis( 2)
Draws boxplots for y by binning on x. This gives a coarse, but quick, representation of the conditional distrubtion of [Y|X] in terms of boxplots.
bplot.xy(x, y, N = 10, breaks = pretty(x, N, eps.correct = 1), style = "tukey", outlier = TRUE, plot = TRUE, xaxt = "s", ...)
bplot.xy(x, y, N = 10, breaks = pretty(x, N, eps.correct = 1), style = "tukey", outlier = TRUE, plot = TRUE, xaxt = "s", ...)
x |
Vector to use for bin membership |
y |
Vector to use for constructing boxplot statistics. |
N |
Number of bins on x. Default is 10. |
breaks |
Break points defining bin boundaries. These can be unequally spaced. |
style |
Type of boxplot default is "tukey". The other choice is "quantile" where the whiskers are drawn to the 5 and 95 percentiles instead being based on the inner fences. |
xaxt |
Plotting parameter for x-axis generation. Default is to produce an x-axis. |
outlier |
If true outliers (points beyond outer fences) will be added to the plots. |
plot |
If false just returns a list with the statistics used for plotting the box plots. |
... |
Any other optional arguments passed to the bplot.obj function see the help file for bplot for details. |
bplot, boxplot
# bivariate normal corr= .6 set.seed( 123) x<-rnorm( 1000) y<- .6*x + sqrt( 1- .6**2)*rnorm( 1000) # # bplot.xy( x,y, breaks=seq( -3, 3,,15) ,xlim =c(-4,4), ylim =c(-4,4)) points( x,y, pch=".", col=3)
# bivariate normal corr= .6 set.seed( 123) x<-rnorm( 1000) y<- .6*x + sqrt( 1- .6**2)*rnorm( 1000) # # bplot.xy( x,y, breaks=seq( -3, 3,,15) ,xlim =c(-4,4), ylim =c(-4,4)) points( x,y, pch=".", col=3)
The data set bread contains height measurements of 48 cupcakes. A batch of Hodgson Mill Wholesome White Bread mix was divided into three parts and mixed with 0.75, 1.0, and 1.25 teaspoons of yeast, respectively. Each part was made into 8 different cupcakes and baked at 350 degrees. After baking, the height of each cupcake was measured. Then the experiment was repeated at 450 degrees.
A data frame with 48 observations on the following 3 variables.
: quantity of yeast (.75, 1 or 1.25 teaspoons)
: baking temperature (350 or 450 degrees)
: cupcake height
The data set bread2
contains averaged measurements
from the full data set, bread
.
The 8 cupcakes in each temp/yeast combination have been averaged.
A data frame with 48 observations on the following 3 variables.
: quantity of yeast (.75, 1 or 1.25 teaspoons)
: baking temperature (350 or 450 degrees)
: cupcake height
The data set cancer
examines a relationship between lung cancer
and cigarette smoking. The data consist of , a standardized measure of
smoking amount (smoke) and the standardized
mortality ratio (SMR) for males in England and Wales in 1970-72 who
were working in 25 different broad groups of jobs such as textile
workers, miners, etc.
A data frame with 25 observations on the following 2 variables.
: standardized measure of smoking amount
: standardized mortality ratio
A Handbook of Small Data Sets by Hand, et al. (1994, p.67).
The capac
data set measures the capacitance of a
capacitor built with
one of 5 shapes and 3 different sizes (area). Other
covariates variables are perimeter length and number of
discontinuities.
A data frame with 15 observations on the following 5 variables.
: measured capacitance
: shape of the capacitor
: perimeter length of the capacitor
: size of the capacitor
: number of discontinuites
# Make a means plot of capacitance by shape and area. mplot(capac$capac,capac$shape,capac$area,both=TRUE)
# Make a means plot of capacitance by shape and area. mplot(capac$capac,capac$shape,capac$area,both=TRUE)
Newton's law of gravitation states that the forces of attraction (f) between two particles of matter is given by the formula f=mm'/(r**2), where m and m' are their respective masses, r the distance between their centers of gravity, and G is the gravitational constant, independent of the kind of matter or intervening medium. From the late eighteenth through nineteenth centuries, a large number of experiments were performed in order to determine G. These experiments were usually designed to determine the earth's attraction of masses and described as experiments to determine the mean density of the earth: if the earth is supposed spherical with radius R and g is the acceleration toward the earth due to gravity, then Newton's law becomes dG=3g/(4(pi)R), where d is the mean density (g/ccm) of the earth. Since g and R could be supposed known, determination of d could be viewed as equivalent to determination of G.
Of all these early experiments, that of Cavendish, performed in 1798 using a torsion balance devised by Michell, is generally considered the best. The completeness of his description of his experiments and the excellence of his methods are often described as an ideal example of scientific experimentation. Cavendish concluded his memoir by presenting 29 determinations of the mean density of the earth. After the 6th of these determinations, Cavendish changed his experimental apparatus by replacing a suspension wire by one that was stiffer. Another interesting feature of the data is that Cavendish calculated the sample mean incorrectly: somehow he used 5.88 instead of 4.88 for the 3rd value. This was first noticed by Baily in 1843 but overlooked by Laplace's analysis of the data in 1820. The "true value" of d is 5.517 (1977 Encyclopedia Britannica).
The data and above description were taken from Stigler (1977, The Annals of Statistics, p. 1055-1098) who obtained it from The Laws of Gravitation edited by A. S. Mackenzie.
A numeric vector with 29 values.
plot(cavendish)
plot(cavendish)
This data frame contains information about 50 of the largest US cities, including location, rainfall, temperature and elevation.
A data frame with 50 observations on the following 7 variables.
: latitude
: average minimum January temperature (degrees F)
: average rainfall in inches
: city names
: average maximum July temperature
: elevation above sea level in KW
: longitude
The Universal Almanac (1992), ed. John W. Wright, Andrews and McNeel, Kansas City.
The data frame college contains statistics relating to colleges from 15 states. This is a sample of fifteen states and certain statistics taken from the Chronicle of Higher Education (most data is for 1992). All entries are in thousands so that Arkansas (first row) has a population of 2,399,000, a yearly per capita income of \$15,400, 85,700 undergraduates students, 7,000 graduate students, and average cost of tuition and fees at public universities of \$1,540, and is located in the south (s for south).
A data frame with 15 observations on the following 7 variables (all data in thousands).
State in which school is located.
State population.
Yearly per capita income.
Total number of undergraduate students.
Total number of graduate students.
Average cost of tuition and fees.
Area of the country (s for south, w for west, ne for northeast, mw for midwest).
Count the number of times the values in the vector meet the specified conditions.
count(x)
count(x)
x |
Vector and condition to count. |
length, nchar
set.seed(1) x <- rnorm(100) # Count the number of times the values in x are greater then 0 count( x>0 ) # Count the number of times the values in x are within the 95% confidence interval count( (x>-1.96) & (x<1.96) ) # Or could have used count( abs(x)<1.96 ) # Count the number of times the values in x are the same as the first element count( x==x[1] )
set.seed(1) x <- rnorm(100) # Count the number of times the values in x are greater then 0 count( x>0 ) # Count the number of times the values in x are within the 95% confidence interval count( (x>-1.96) & (x<1.96) ) # Or could have used count( abs(x)<1.96 ) # Count the number of times the values in x are the same as the first element count( x==x[1] )
The data set draft
contains average lottery numbers by month
for the 1970 Draft Lottery. In December of 1969 the U.S. randomly drew
from the 366 possible birthdays without replacement.
The draw order of each birhtday determined the order by which men born between 1944-1950
(those eligible in the 1970 draft) were drafted. For example, a person with a birthday lottery
number of 63 was drafted fairly early in 1970; a person with number 300 was not drafted
at all. Sommers (2003, Chance Magazine) looked up deaths by age and birthday
on the Vietnam Veterans Memorial. Thus, the data set has deaths by month as well.
A data frame with 12 observations on the following 4 variables.
: Month of the birthday
: Average lottery number of all birthdays in the month
: Total number of deaths by month
: breaks months into 2 groups (first for Jan-June and second for July-Dec)
Death statistics available on-line at http://thewall-usa.com/.
The data set drill
contains the results of testing two types of drill
bits in the manufacture of compressors. There were two brands considered (Besley and Cleveland),
and the measurements are the number of holes drilled until the bit breaks.
The tests were done under the same manufacturing conditions, and the influence on performance due
to factors other than the brand was minimized.
A data frame with 14 observations on the following 3 variables.
: drill manufacturer (Beasly or Cleveland)
: number of holes drilled before break
: price of a bit
lplot(drill$brand,drill$price/drill$holes, main='Price per Hole for Drill Bits',ylab='Price per Hole')
lplot(drill$brand,drill$price/drill$holes, main='Price per Hole for Drill Bits',ylab='Price per Hole')
The data set earthq
includes the dominant frequency and magnitude of
148 earthquakes.
A data frame with 148 observations on the following 5 variables.
: location of the earthquake
: dominant frequency
: magnitude
: depth
: distance
Earthquake Engineering and Structural Dynamics, Vol. 23, p. 583-597, 1994
The data set etruscan
contains the maximum width for 84 skulls
of Etruscan males and 70 modern Italian males. This data was gathered
in attempt to determine if Etruscans were native Italians or immigrants
from another land.
A data frame with 154 observations on the following 2 variables.
: skull width (in mm)
: ancient or modern
Medical Biology and Etruscan Origins, p. 136
Density, distribution function, quantile function and random
generation for the exponential distribution with mean beta
or 1/rate
).
This special Rlab implementation allows the parameter beta
to be used, to match the function description often found in textbooks.
dexp(x, rate = 1, beta = 1/rate, log = FALSE) pexp(q, rate = 1, beta = 1/rate, lower.tail = TRUE, log.p = FALSE) qexp(p, rate = 1, beta = 1/rate, lower.tail = TRUE, log.p = FALSE) rexp(n, rate = 1, beta = 1/rate)
dexp(x, rate = 1, beta = 1/rate, log = FALSE) pexp(q, rate = 1, beta = 1/rate, lower.tail = TRUE, log.p = FALSE) qexp(p, rate = 1, beta = 1/rate, lower.tail = TRUE, log.p = FALSE) rexp(n, rate = 1, beta = 1/rate)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
beta |
vector of means. |
rate |
vector of rates. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are
|
If beta
(or rate
) is not specified, it assumes the
default value of 1
.
The exponential distribution with rate has density
for .
dexp
gives the density,
pexp
gives the distribution function,
qexp
gives the quantile function, and
rexp
generates random deviates.
The cumulative hazard
is
-pexp(t, r, lower = FALSE, log = TRUE)
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth \& Brooks/Cole.
exp
for the exponential function,
dgamma
for the gamma distribution and
dweibull
for the Weibull distribution, both of which
generalize the exponential.
dexp(1) - exp(-1) #-> 0
dexp(1) - exp(-1) #-> 0
The data set framerate
contains processor speed, memory size, and
screen resolution for Riva TNT video cards. The frame rates for these cards
were measured using Quake II, a standard benchmarking program for 3D graphics.
A data frame with 36 observations on the following 6 variables.
: processor (Celeron 333 or Pentium II 450)
: memory size (64, 128 or 256 kB)
: screen resolution (640x480, 800x600 or 1024x768)
: frames per second
: total number of screen pixels
: 0 for Celeron, 1 for Pentium
Several students studied the relationship between file size and transfer times using ftp (File Transfer Protocol) to retrieve files from two Internet locations. At each location three different files were transferred 5 times and averaged (to reduce variability).
A data frame with 6 observations on the following 3 variables.
: file size (in bytes)
: transfer time (in seconds)
: internet location (0 or 1)
The data in ftptime
are 40 ftp times for a file of 343285 bytes
which was repeatedly obtained from a site in California.
A numeric vector with 40 values.
Density, distribution function, quantile function and random
generation for the Gamma distribution with parameters alpha
(or shape
) and beta
(or scale
or 1/rate
).
This special Rlab implementation allows the parameters alpha
and beta
to be used, to match the function description
often found in textbooks.
dgamma(x, shape, rate = 1, scale = 1/rate, alpha = shape, beta = scale, log = FALSE) pgamma(q, shape, rate = 1, scale = 1/rate, alpha = shape, beta = scale, lower.tail = TRUE, log.p = FALSE) qgamma(p, shape, rate = 1, scale = 1/rate, alpha = shape, beta = scale, lower.tail = TRUE, log.p = FALSE) rgamma(n, shape, rate = 1, scale = 1/rate, alpha = shape, beta = scale)
dgamma(x, shape, rate = 1, scale = 1/rate, alpha = shape, beta = scale, log = FALSE) pgamma(q, shape, rate = 1, scale = 1/rate, alpha = shape, beta = scale, lower.tail = TRUE, log.p = FALSE) qgamma(p, shape, rate = 1, scale = 1/rate, alpha = shape, beta = scale, lower.tail = TRUE, log.p = FALSE) rgamma(n, shape, rate = 1, scale = 1/rate, alpha = shape, beta = scale)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
rate |
an alternative way to specify the scale. |
alpha , beta
|
an alternative way to specify the shape and scale. |
shape , scale
|
shape and scale parameters. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are
|
If beta
(or scale
or rate
) is omitted, it assumes
the default value of 1
.
The Gamma distribution with parameters alpha
(or shape
)
and
beta
(or scale
) has density
for ,
and
.
The mean and variance are
and
.
pgamma()
uses algorithm AS 239, see the references.
dgamma
gives the density,
pgamma
gives the distribution function
qgamma
gives the quantile function, and
rgamma
generates random deviates.
The S parametrization is via shape
and rate
: S has no
scale
parameter.
The cumulative hazard
is
-pgamma(t, ..., lower = FALSE, log = TRUE)
.
pgamma
is closely related to the incomplete gamma function. As
defined by Abramowitz and Stegun 6.5.1
is
pgamma(x, a)
. Other authors (for example
Karl Pearson in his 1922 tables) omit the normalizing factor,
defining the incomplete gamma function as pgamma(x, a) * gamma(a)
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth \& Brooks/Cole.
Shea, B. L. (1988) Algorithm AS 239, Chi-squared and Incomplete Gamma Integral, Applied Statistics (JRSS C) 37, 466–473.
Abramowitz, M. and Stegun, I. A. (1972) Handbook of Mathematical Functions. New York: Dover. Chapter 6: Gamma and Related Functions.
gamma
for the Gamma function, dbeta
for
the Beta distribution and dchisq
for the chi-squared
distribution which is a special case of the Gamma distribution.
-log(dgamma(1:4, alpha=1)) p <- (1:9)/10 pgamma(qgamma(p,alpha=2), alpha=2) 1 - 1/exp(qgamma(p, alpha=1))
-log(dgamma(1:4, alpha=1)) p <- (1:9)/10 pgamma(qgamma(p,alpha=2), alpha=2) 1 - 1/exp(qgamma(p, alpha=1))
The data set golf
was taken from PGA Tour Records
of 195 golf rounds by PGA players in an attempt to explain
what golf attributes contribute the most to low scores.
A data frame with 195 observations on the following 7 variables.
: score on the 18 holes (par 72)
: average distance of drive on two holes
in opposite direction (to balance out wind)
: percentage of times that dirve was in the
fairway for par 4 and par 5 holes
: average number of putts in the round for
holes where the green was hit in regulation
: an estimate of sand trap play accuracy
based on the residuals from regressing percentage
of successful pars from traps on putts
: based on the residuals from regressing the score
on par three holes on putts, sand, and chip
: an estimate of chipping accuracy
based on the residuals from regressing percentage
of successful pars on holes not hit in regulation
on putts
"Drive for Show and Putt for Dough" by Scott M. Berry,
Chance, Vol. 12, No. 4, p. 50-55, 1999
Plots a histogram in the same manner as hist
, but with the following changes:
freq = FALSE
by default, to print the density instead of the frequency and
nclass
specifies the exact number of bins to use (calculated by
equally separating the distance between the min and max value to be graphed)
hplot(x, breaks = "Sturges", freq = FALSE, nclass = NULL, col = 8, ...)
hplot(x, breaks = "Sturges", freq = FALSE, nclass = NULL, col = 8, ...)
x |
a vector of values for which the histogram is desired. |
breaks |
see |
freq |
logical; if 'FALSE' (default), relative frequencies ("probabilities"), component 'density', are plotted; if 'TRUE', the histogram graphic is a representation of frequencies, the 'counts' component of the result. |
nclass |
numeric (integer); the number of bins for the histogram. If both |
col |
color of the histogram bars (8, the default, is grey). |
... |
Other arguments controlling the plot. Many graphical plotting
arguments may be used. See help on |
hist, plot
# Create and graph some Normal data set.seed(100) set.panel(3,1) z<- rnorm(100) hplot(z, nclass=5, main="Standard Normal", xlim=c(-10,10), ylim=c(0,.4)) z<- rnorm(100, sd=2) hplot(z, nclass=10, main="Std Dev of 2", xlim=c(-10,10), ylim=c(0,.4)) z<- rnorm(100, sd=3) hplot(z, nclass=15, main="Std Dev of 3", xlim=c(-10,10), ylim=c(0,.4))
# Create and graph some Normal data set.seed(100) set.panel(3,1) z<- rnorm(100) hplot(z, nclass=5, main="Standard Normal", xlim=c(-10,10), ylim=c(0,.4)) z<- rnorm(100, sd=2) hplot(z, nclass=10, main="Std Dev of 2", xlim=c(-10,10), ylim=c(0,.4)) z<- rnorm(100, sd=3) hplot(z, nclass=15, main="Std Dev of 3", xlim=c(-10,10), ylim=c(0,.4))
The data set insulate
is one person's record of weekly gas
consumption (gas) and outside temperature (temp), before (insulation=0)
and after (insulation=1) insulating a house. The house thermostat was
set at 20 degrees Celsius during the 26 weeks before and 30 weeks after
insulating.
A data frame with 56 observations on the following 3 variables.
: before insulation (0) or after (1)
: outside temperature (in degrees Celsius)
: gas consumption (in 1000 cubic feet)
A Handbook of Small Data Sets
This data set is a subset of the actuator
data set
without the line
or nozzle
factors.
A data frame with 16 observations on the following 4 variables.
actuator used (A1 or A2)
amount of pressure used (30psi or 100psi)
force produced
experimental order
bplot(jet$force,by=jet$act) mplot(jet$force,jet$act,jet$press,both=TRUE) anova( lm(force ~ act+press+act:press, data=jet) )
bplot(jet$force,by=jet$act) mplot(jet$force,jet$act,jet$press,both=TRUE) anova( lm(force ~ act+press+act:press, data=jet) )
4th moment kurtosis ratio
kurt(x)
kurt(x)
x |
vector |
skew
set.seed(1) x <- rexp(100) # Get kurtosis coefficient estimate for exponential distribution kurt(x)
set.seed(1) x <- rexp(100) # Get kurtosis coefficient estimate for exponential distribution kurt(x)
Plots x versus y with optional labels. The x or y variable may be a character vector, but not both.
lplot(x, y, labels = "*", srt = 0, tcex = 0.7, ...)
lplot(x, y, labels = "*", srt = 0, tcex = 0.7, ...)
x |
Vector to be graphed on x-axis. May be a character vector, if y is numeric. |
y |
Vector to be graphed on y-axis. May be a character vector, if x is numeric. |
labels |
Character vector containing the labels for individual points. |
srt |
A numerical value specifying (in degrees) how strings should be rotated.
It is unwise to expect values other than multiples of 90 to work.
See help on |
tcex |
A numerical value giving the amount by which the labels text or symbols should be scaled relative to the default. |
... |
Other arguments controlling the plot. Many graphical plotting
arguments may be used. See help on |
plot, bplot, boxplot, mplot
# Create some Normal data set.seed(123) temp<- data.frame(matrix(rnorm(12*8), ncol=12)) pos<- c(1:6,9:14) lplot(temp) # Now see some labels lplot(temp, labels=paste("Y",1:12), tcex=.5) # Create a data set with two factors (age and gender) race<-data.frame(c(1.02,.99,1.11,1.30,1.09,1.26,1.21,1.19,1.30,1.45,1.34,1.49), c('M','M','M','M','M','M','F','F','F','F','F','F'), c('under 50','under 50','under 50','over 50','over 50','over 50', 'under 50','under 50','under 50','over 50','over 50','over 50')) names(race)<-c("time","gender","age") # Plot the data to see the factors lplot(race$gender, race$time, race$age)
# Create some Normal data set.seed(123) temp<- data.frame(matrix(rnorm(12*8), ncol=12)) pos<- c(1:6,9:14) lplot(temp) # Now see some labels lplot(temp, labels=paste("Y",1:12), tcex=.5) # Create a data set with two factors (age and gender) race<-data.frame(c(1.02,.99,1.11,1.30,1.09,1.26,1.21,1.19,1.30,1.45,1.34,1.49), c('M','M','M','M','M','M','F','F','F','F','F','F'), c('under 50','under 50','under 50','over 50','over 50','over 50', 'under 50','under 50','under 50','over 50','over 50','over 50')) names(race)<-c("time","gender","age") # Plot the data to see the factors lplot(race$gender, race$time, race$age)
List the objects in Rlab. By default the Rlab datasets are listed, however "functions" or "all" can be specified to list only the Rlab functions or everything in Rlab.
ls.rlab(what="data") ls.summary.rlab(what="data")
ls.rlab(what="data") ls.summary.rlab(what="data")
what |
character string specifying which Rlab object to list, which may be one of
"data" or "d" (default) : lists datasets
"functions" or "f" : lists functions
"all" or "a" : lists everything
"ex" or "e" : lists the files which can be viewed with the |
The ls.summary.rlab
function will list various object attributes, such as class and size.
ls, search
# list all Rlab datasets and their sizes ls.summary.rlab() # list all Rlab functions ls.rlab("functions")
# list all Rlab datasets and their sizes ls.summary.rlab() # list all Rlab functions ls.rlab("functions")
The magnet dataset is from an experiment concerning the magnetic force of an electomagnet as a function of voltage and number of wire turns. The device was a wire wrapped around a core and measured at a variety of voltages. The statistical design here is actually a randomized complete block design where the three eletromagnets are the blocks, and the three voltages are levels of the factor voltage.
Voltage applied (1.5 or 3.0 volts).
The number of wire turns (100, 200, or 300, as factors).
The magnetic force.
The data consists of the winning times (in minutes) for men and women at the New York Marathon, 1978-1998, along with the temperature in Fahrenheit.
A data frame with 21 observations on the following 4 variables.
: 4-digit year
: temperature (in Fahrenheit)
: men's winning time (in seconds)
: women's winning time (in seconds)
"The Effects of Temperature on Marathon Runners' Performance" by David E. Martin and John F. Buoncristiani, Chance, Vol. 12, No. 4, 1999
Calculates means for individual factors and two-way factor combinations. Any number of factors may be input and the indivdual factor means as well as all possible two-way means will be shown for each factor. Three-way, four-way, etc. means are not shown, even when more than 2 factors are given.
means(y, ..., dec = 3)
means(y, ..., dec = 3)
y |
Vector of responses whose means are shown. |
... |
Vectors of independent variables on which the responses' means are broken down. |
dec |
Number of decimal places to print. |
mplot, mean
# Create a data set with two factors (age and gender) race<-data.frame(c(1.02,.99,1.11,1.30,1.09,1.26,1.21,1.19,1.30,1.45,1.34,1.49), c('M','M','M','M','M','M','F','F','F','F','F','F'), c('under 50','under 50','under 50','over 50','over 50','over 50', 'under 50','under 50','under 50','over 50','over 50','over 50')) names(race)<-c("time","gender","age") # Show mean times broken by age, gender and age & gender means(race$time, race$age, race$gender)
# Create a data set with two factors (age and gender) race<-data.frame(c(1.02,.99,1.11,1.30,1.09,1.26,1.21,1.19,1.30,1.45,1.34,1.49), c('M','M','M','M','M','M','F','F','F','F','F','F'), c('under 50','under 50','under 50','over 50','over 50','over 50', 'under 50','under 50','under 50','over 50','over 50','over 50')) names(race)<-c("time","gender","age") # Show mean times broken by age, gender and age & gender means(race$time, race$age, race$gender)
The data set metalcut
is an attempt to determine which cutting
method (vertical or horizontal) yields the quickest and smoothest cuts
for three types of metal stock (angle, flat, round). Students from the
Biological Engineering Department measured cutting times and quality
for the six combinations of method and stock.
A data frame with 18 observations on the following 4 variables.
: cutting method (hcut for horizontal or vcut for vertical
: type of metal stock (angle, flat or round)
: cutting time (in seconds)
: smooth (0) or rough (1)
Calculates main fitted effects for individual factors and two-way interaction fitted effects for all pairs of factors. Any number of factors may be input. Three-way, four-way, etc. fitted effects are not shown, even when more than two factors are given.
mfit(y, ..., dec = 3)
mfit(y, ..., dec = 3)
y |
Vector of responses whose fitted effects are shown. |
... |
Vectors of different factors. |
dec |
Number of decimal places to print. |
lm, fitted
# Create a data set with two factors (age and gender) race<-data.frame(c(1.02,.99,1.11,1.30,1.09,1.26,1.21,1.19,1.30,1.45,1.34,1.49), c('M','M','M','M','M','M','F','F','F','F','F','F'), c('under 50','under 50','under 50','over 50','over 50','over 50', 'under 50','under 50','under 50','over 50','over 50','over 50')) names(race)<-c("time","gender","age") # Show fitted effects for age, gender and age & gender means(race$time, race$age, race$gender)
# Create a data set with two factors (age and gender) race<-data.frame(c(1.02,.99,1.11,1.30,1.09,1.26,1.21,1.19,1.30,1.45,1.34,1.49), c('M','M','M','M','M','M','F','F','F','F','F','F'), c('under 50','under 50','under 50','over 50','over 50','over 50', 'under 50','under 50','under 50','over 50','over 50','over 50')) names(race)<-c("time","gender","age") # Show fitted effects for age, gender and age & gender means(race$time, race$age, race$gender)
These data are actually measurements obtained by Michelson between June 5, 1879, and July 2, 1879. The data are in km/sec if 299000 is added to each value. Working backwards from the current ‘true value’ of the velocity of light in vacuum (299,792.5 km/sec) and using Michelson's own adjustment for the effect of air, the comparable ‘true value’ for these data is 734.5 (considerably smaller than the actual measurements). Michelson used a modification of Foucault's 1850 experimental method which consisted of passing light from a source off a rapidly rotating mirror to a distant fixed mirror, and back to the rotating mirror. Presumably the five sets of 20 measurements are in time sequence. From Stigler (1977 Annals of Statistics, p.1073-1076, Table 6).
A data frame with 100 observations on the following 2 variables.
: measured velocity of light as described above
: the set in which the measurement was taken
lplot(michelson$velocity,michelson$set) bplot(michelson$velocity,michelson$set)
lplot(michelson$velocity,michelson$set) bplot(michelson$velocity,michelson$set)
The data set monarch
contains the years lived after inauguration,
election, or coronation of popes, U.S. presidents, and British monarchs
from 1690 to 1970.
A data frame with 72 observations on the following 3 variables.
: group (K&Qs, popes or pres)
: year lived after coronation, inauguration or election
: name of the monarch, pope or president (no spaces)
Computer-Active Data Analysis by Lunn and McNeil (1991)
Graphs means for two-way factor combinations (interaction plots). Any number of factors may be included and all possible two factor combinations will be plotted.
mplot(y, ..., both = FALSE)
mplot(y, ..., both = FALSE)
y |
Vector of responses whose means are graphed. |
... |
Vectors of independent variables on which the responses' means are broken down. |
both |
If TRUE, creates additional plots with the opposite factor on the x-axis. |
interaction.plot, means, mean
# Create a data set with three factors (age, gender and number of water breaks) race<-data.frame(c(1.02,.99,1.11,1.30,1.09,1.26,1.21,1.19,1.30,1.45,1.34,1.49), c('M','M','M','M','M','M','F','F','F','F','F','F'), c('under 50','under 50','under 50','over 50','over 50','over 50', 'under 50','under 50','under 50','over 50','over 50','over 50'), c(1,0,2,2,0,1,2,1,0,2,1,0)) names(race)<-c("time","gender","age","water") # Show mean times broken by age, gender and age & gender mplot(race$time, race$age, race$gender) # Show 2 plots, with age and then gender along the x-axis mplot(race$time, race$age, race$gender, both=TRUE) # Now also consider water breaks mplot(race$time, race$age, race$gender, race$water, both=TRUE) # Print the means for the above plots means(race$time, race$age, race$gender, race$water)
# Create a data set with three factors (age, gender and number of water breaks) race<-data.frame(c(1.02,.99,1.11,1.30,1.09,1.26,1.21,1.19,1.30,1.45,1.34,1.49), c('M','M','M','M','M','M','F','F','F','F','F','F'), c('under 50','under 50','under 50','over 50','over 50','over 50', 'under 50','under 50','under 50','over 50','over 50','over 50'), c(1,0,2,2,0,1,2,1,0,2,1,0)) names(race)<-c("time","gender","age","water") # Show mean times broken by age, gender and age & gender mplot(race$time, race$age, race$gender) # Show 2 plots, with age and then gender along the x-axis mplot(race$time, race$age, race$gender, both=TRUE) # Now also consider water breaks mplot(race$time, race$age, race$gender, race$water, both=TRUE) # Print the means for the above plots means(race$time, race$age, race$gender, race$water)
The ncsu data frame has 92 rows and 2 columns. It gives the number of degrees granted at North Carolina State University (NCSU) from 1894 to 1983.
A data frame with 92 observations on the following 2 variables.
: 4 digit year
: number of degrees granted
Synthetic versus Conventional Oil
A data frame with 8 observations on the following 3 variables.
: type of oil (conv or syn)
: viscosity
: time
The ozone data frame has 552 rows and 3 columns. The first column is the mean monthly ozone concentration in Dobson units of the ozone layer at Arosa, Switzerland, from 1926 to 1971.
A data frame with 518 observations on the following 3 variables.
: mean monthly ozone in Dobson units
: year in which measurements were taken (4 digits)
: month in which measurements were taken (3 letter abbreviation)
In Spring 2000 a team measured ping times of Internet servers at various
distances from Raleigh using a software program called NeoTrace. They
actually measured ping times at 4 different times of the day, but since
there was very little difference over time, we have averaged over the
times of day.
A data frame with 12 observations on the following 2 variables.
: distance from Raleigh
: ping response time
The popcorn data frame has 16 rows and 6 columns. Four factors are varied to see what produces the largest volume of popcorn. The order that the observations were made is in the column order.
A data frame with 16 observations on the following 5 variables.
: brand of popcorn used (Orville Reddenbach or Jolly Time)
: temperature of oil (hot or cold)
: quantity of oil used (3 Tsp or 1 Tsp)
: whether the pan was shaken (yes or no)
: volume of popcorn produced
This experiment was designed and carried out by Stan Spencer for the NCSU statistics class ST516. Here is an excerpt of his report:
INTRODUCTION
Popcorn has always been a crucial element of sustenance in my life and I've always wondered what effects certain factors have in the making of a good batch. Now, having acquired some of the basic tools of statistical experimentation I have been able to optimize my frequent ritual of popping popcorn as well as understand exactly how much of an effect these factors have on the desired outcome. The purpose of the experiment was to optimize the factors involved for the maximum volume of popcorn. I focused on stovetop popping and didn't look at microwave or air-pop methods.
DESIGN AND TEST CONDITIONS
I chose the two major popcorn brands with the motive of trying to prove if Orville's claims are true. The second factor refers to the temperature of the oil at the time the popcorn was put in the pan. For the experiments requiring cold oil I added the popcorn to the oil before putting the pan on the stove. For the hot oil treatments I let it heat for 20 seconds before adding the popcorn. The quantity of oil factor required either one or three tablespoons. The last factor I thought was important was to either shake or not shake the pan during cooking. The conditions of the test that were kept constant for each treatment are: 1/2 cup of popcorn was used, the pan was cooled and washed between treatments, the gas flame was set to a constant, and the same pan and oil type were used for each treatment. The volume was measured with a measuring cup with units in mL.
Average monthly rainfall in Raleigh, NC for 1948-1992. Values are
recorded only for February, March, May, June, and August.
A data frame with 45 observations on the following 6 variables.
: 2 digit year
: February rainfall (in inches)
: March rainfall (in inches)
: June rainfall (in inches)
: August rainfall (in inches)
: May rainfall (in inches)
The data set quake
contains the strengths of earthquakes measured
at the earth's continental plates. Much of the earth's seismic activity
is due to motion of the large plates that make up the crust of the earth.
Earthquakes occur when a buildup in tension between two layers of rock is
suddenly released. For this reason many earthquakes occur at plate
boundaries.
A data frame with 496 observations on the following 5 variables.
: latitude of the event
: longitude of the event
: direction of earthquake
: strength of the earthquake (Richter scale)
: numerical code for plate boundary
The data set raleigh.snow
contains the annual snowfall totals
for Raleigh, NC from the 1962-63 season through the 1992-92 season.
A data frame with 30 observations on the following 2 variables.
: 2 digit year
: annual snowfall (in inches)
The raleigh.temp data frame has 480 rows and 3 columns. Each year has 12 rows of data, one for each month. The measurement is likely to be the average of the average daily temperature, where the average daily temperature = (daily high+daily low)/2.
A data frame with 480 observations on the following 3 variables.
: temperature = (daily high+daily low)/2
: the month the measurements were taken during
: the year the measurements were taken during
Rlab is a collection of functions and datasets to be used in the class ST 370, Probability and Statistics for Engineers, at North Carolina State University. For more information see the class labs at: https://www4.stat.ncsu.edu/~bmasmith/NewST370WEB/rlab/rlab.html
Some major methods include:
bplot
- customized boxplot
hplot
- customized boxplot
lplot
- label plot (allows character z or y)
means
- 2-way means
mfit
- 2-way interaction fit
mplot
- means plot
stats
- variety of statistics
US
- plot of the United States
world
- plot of the world
These labs are based on Slab and Mlab by Doug Nychka and Dennis Boos.
DISCLAIMER:
This is software for statistical research and not for commercial uses. The authors do not guarantee the correctness of any function or program in this package. Any changes to the software should not be made without the authors permission.
When salt comes into contact with ice, it tends to break apart into
individual ions which then interact with the frozen water and disrupt
hydrogen bonds that have formed between ice molecules. This lowers the
melting temperature of ice, and it was hypothesized that the melting
process would be hastened. The data set salt
was collected
during an experiment to determine whether varying the type and amount
of salt applied to a specific amount of ice has an effect on the
interval required to melt that ice.
data(salt)
data(salt)
A data frame with 24 observations on the following 3 variables.
: type of salt (rock salt or table salt)
: amount of salt used (in teaspoons)
: time for ice to melt (in seconds)
Background: The Effect of Salt on the Rate at Which Ice Melts
In those sections of the country that experience winter as a time of
snow and ice, salt is often spread on roadways in an attempt to counter
the hazardous consequences of accumulated ice. Ice is formed when the
relatively disordered molecules in liquid water reach a temperature of
32 degrees F (0 degrees C) and begin to "nucleate" or form solid ice
crystals consisting of ordered water molecules. Salt, when in contact
with ice, tends to break apart into individual ions (i.e. sodium and
chloride) which then interact with the water and disrupt the hydrogen
bonds that have formed between water molecules. Since no covalent bonds
are broken or formed, the resulting chemical "solvation" is not
considered to be a chemical reaction. However, the end result from the
introduction of salt is that the ice crystals are disrupted and liquid
water is achieved.
The purpose of the current experiment is to study the effect of salt on
the rate at which ice melts. More specifically, the experiment is being
conducted to answer the following questions:
1. Does varying the amount of salt applied to a constant quantity of ice
result in a change in the rate of melting?
2. Does the type of salt used have an effect on the melting rate?
The first question is of interest as it relates to issues such as the
cost of salt and the potential harmful effects of its use on pavement.
If increasing the amount of salt applied to a given quantity of ice is
not accompanied by an increase in melting rate, any application of salt
beyond minimal amounts would constitute a waste of public money and
possibly cause unnecessary damage to public roadways. It is
hypothesized that the relationship between amount of salt used and the
time required to completely melt a given quantity of ice is negative and
significant.
Likewise, the second question seeks to address the possibility that
dissimilar forms of salt may produce different rates of melting. To
answer this question, table salt and rock salt were included in the
experimental design. Although both are chemically similar, rock salt
consists of larger crystals than does the typical table salt bought in
local supermarkets. Given the greater density and more efficient
packing of NaCl molecules within the larger rock salt crystals, a
specified volume of rock salt will likely contain a greater number of
salt molecules than a similar volume of the less tightly packed table
salt crystals. Therefor, it is hypothesized that rock salt will result
in faster melting times than table salt.
Materials
Tap water
42 - 6 ounce plastic cups (paper cups tend to break at the seam as the
contents freeze)
Morton brand table salt
Morton brand rock salt
1/2 cup measure
Stop Watch
Procedure
To answer the questions posed above, a balanced 2 x 4 factorial design
was employed with amount of salt identified as a factor consisting of
four levels (i.e. no salt, 1/2 tsp, 1 tsp, 1 tbsp), and the other factor
being type of salt with two levels (i.e. table salt, rock salt). Three
replications were conducted within each cell for a total of 24 runs. A
p-level of .05 was identified for statistical significance prior to the
data collection phase of the project.
Twenty-four small plastic cups were each labeled with a number
designating type of salt, and a letter A-D indicating amount of salt.
Each plastic cup was then filled with 4 ounces of tap water and placed
in the freezer overnight (approximately 16 hours).
Since salt could not be emptied into all of the ice cups
simultaneously, the remaining 18 plastic cups were each labeled and then
used to hold an amount and type of salt corresponding to one of the
experimental conditions. After the ice cups had been removed from the
freezer, each salt cup was quickly emptied into a corresponding ice cup
with matching identification so as to minimize the time interval between
the application of salt to the first and last cups.
After the last cup of salt had been emptied into the appropriate ice
cup, the stopwatch was started. Room temperature during the data
collection phase was approximately 72 degrees Fahrenheit. The time was
recorded for each cup when ice was no longer visible in that cup.
Taken from a 1999 project by Wayde D. Johnson
3rd moment skewness ratio
skew(x)
skew(x)
x |
vector |
kurt
set.seed(1) x <- rexp(100) # Get skewness coefficient estimate for exponential distribution skew(x)
set.seed(1) x <- rexp(100) # Get skewness coefficient estimate for exponential distribution skew(x)
The data set solar
is from an experiment investigating the effect
of surface area (cell), distance, and light intensity on the output voltage
of photovoltaic cells (solar cells).
A data frame with 18 observations on the following 4 variables.
: surface area (0 for 8 sq. cm and 1 for 3 sq. cm)
: light intensity
: distance
: solar cell's output voltage
Data collected by a team of civil engineering students in an attempt to determine if the suggested 24 hour soak time for measuring the specific gravity and absorption of course aggregate (e.g., granite or limestone) was really necessary. They obtained samples of course aggregate from eight quarries and measured three types of specific gravity (BDSG, SSDSG and ASG), and absorption (abs) at five soak times (6, 18, 24, 48, and 72 hours).
A data frame with 40 observations on the following 7 variables.
: quarry sample was taken from
: type of rock (Granite or Limestone)
: soak time (6, 18, 24, 48, or 72 hours)
: dry specific gravity
: SSD specific gravity
: apparent specific gravity
: absorption
Various summary statistics are calculated for different types of data.
stats(x, by)
stats(x, by)
x |
The data structure to compute the statistics. This can either be a vector, matrix (data sets are the columns), or a list (data sets are the components). |
by |
If x is a vector, an optional vector (either character or numerical) specifying the categories to divide x into separate data sets. |
Stats breaks x up into separate data sets and then calls describe to calculate the statistics. Statistics are found by columns for matrices, by components for a list and by the relevent groups when a numeric vector and a by vector are given. The default set of statistics are the number of (nonmissing) observations, mean, standard deviation, minimum, lower quartile, median, upper quartile, maximum, and number of missing observations. If any data set is nonnumeric, missing values are returned for the statistics. The by argument is a useful way to calculate statistics on parts of a data set according to different cases.
A matrix where rows index the summary statistics and the columns index the separate data sets.
stats.bplot, mean, sd
#Statistics for 8 normal random samples: zork<- matrix( rnorm(200), ncol=8) stats(zork) zork<- rnorm( 200) id<- sample( 1:8, 200, replace=TRUE) stats( zork, by=id)
#Statistics for 8 normal random samples: zork<- matrix( rnorm(200), ncol=8) stats(zork) zork<- rnorm( 200) id<- sample( 1:8, 200, replace=TRUE) stats( zork, by=id)
Various summary statistics are calculated for different types of data. Same as stats with addition of skewness and kurtosis.
stats2(x, by, digits=8)
stats2(x, by, digits=8)
x |
The data structure to compute the statistics. This can either be a vector, matrix (data sets are the columns), or a list (data sets are the components). |
by |
If x is a vector, an optional vector (either character or numerical) specifying the categories to divide x into separate data sets. |
digits |
Default number of digits is 8. This allows it to be set smaller. |
Stats breaks x up into separate data sets and then calls describe to calculate the statistics. Statistics are found by columns for matrices, by components for a list and by the relevent groups when a numeric vector and a by vector are given. The default set of statistics are the number of (nonmissing) observations, mean, standard deviation, skewness, kurtosis, minimum, lower quartile, median, upper quartile, maximum, and number of missing observations. If any data set is nonnumeric, missing values are returned for the statistics. The by argument is a useful way to calculate statistics on parts of a data set according to different cases.
A matrix where rows index the summary statistics and the columns index the separate data sets.
stats, stats.bplot, mean, sd
#Statistics for 8 normal random samples: zork<- matrix( rnorm(200), ncol=8) stats2(zork) zork<- rnorm( 200) id<- sample( 1:8, 200, replace=TRUE) stats2( zork, by=id)
#Statistics for 8 normal random samples: zork<- matrix( rnorm(200), ncol=8) stats2(zork) zork<- rnorm( 200) id<- sample( 1:8, 200, replace=TRUE) stats2( zork, by=id)
The data set us.age
contains the average age for all Americans (all),
females (f), and males (m) for the years 1990-1999. Actually, the data from
the US Census Bureau are based on the 1990 census and then updated yearly.
A data frame with 10 observations on the following 4 variables.
: average age of all Americans
: average age of female Americans
: average age of male Americans
: 4 digit year
US Census Bureau
The data set us.pop
contains the United States population for the
years 1900-1999.
A data frame with 100 observations on the following 2 variables.
: 2 digit year
: population (in millions)
View the first X rows (10, by default) rows of a data set. Columns names are displayed if appropriate.
view(x, maxlines = 10)
view(x, maxlines = 10)
x |
data set to be viewed; can be data.frame, matrix, list or vector. |
maxlines |
maximum number of rows to be displayed. |
If the data set contains more rows than maxlines, then a message indicating the number unviewed rows id displayed. If the data set contains fewer rows the maxlines, only those rows are displayed.
ls, objects
The viscosity data frame was the result of the following ST370 experiment performed in the fall of 1996. The students' description is as follows.
For this experiment we used three different liquids: water, cooking oil and shampoo. First we placed a cup of shampoo in a microwave oven, and heated it for 50 seconds. Immediately after that we transfered the liquid to a dishwasher container. We turned this container upside down with the spout closed and poked a hole on the bottom part of it. Then we placed a half cup measuring container beneath the dishwasher container. Then we opened the spout of the dishwashing container, and measured the time it took for liquid to come out and fill the half cup container. We repeated the same procedure with each liquid three times. Then we placed the liquids at room temperature in the container and repeated the above prcedure three times as well. Then we placed each liquid in the freezer 10 minutes at a time and repeated the prior procedure three times.
A data frame with 26 observations on the following 3 variables.
: liquid used (shampoo, oil or water)
: temperature (hot or cold)
: time (in seconds)
The data set vocab
contains the average oral vocabulary size (words)
for children at different ages (age).
A data frame with 10 observations on the following 2 variables.
: age (in years)
: vocabulary (in number of words)
Discovering Psychology by Weiner, 1977
The data set webhost
contains the results of a compariosn of
different webhosts. A student team decided to compare four free hosting
services in the spring of 2000: go.com, angelfire.com, geocities.com,
and xoom.com. They uploaded four pages:
* one with text only (100k),
* one with text only (100k) and one 20k jpeg image
* one with text only (100k) and two 20k jpeg images
* one with text only (100k) and three 20k jpeg images
The last page wouldn't load for xoom; so they had 15 data points with load
times for the response variable and the number of graphic images for a
quantitative independent variable.
A data frame with 15 observations on the following 3 variables.
: number of 29k jpeg images on uploaded page
: time to up load page (in seconds)
: web host (angelfire, geocities, go or xoom)
Density, distribution function, quantile function and random
generation for the Weibull distribution with parameters alpha
(or shape
) and beta
(or scale
).
This special Rlab implementation allows the parameters alpha
and beta
to be used, to match the function description
often found in textbooks.
dweibull(x, shape, scale = 1, alpha = shape, beta = scale, log = FALSE) pweibull(q, shape, scale = 1, alpha = shape, beta = scale, lower.tail = TRUE, log.p = FALSE) qweibull(p, shape, scale = 1, alpha = shape, beta = scale, lower.tail = TRUE, log.p = FALSE) rweibull(n, shape, scale = 1, alpha = shape, beta = scale)
dweibull(x, shape, scale = 1, alpha = shape, beta = scale, log = FALSE) pweibull(q, shape, scale = 1, alpha = shape, beta = scale, lower.tail = TRUE, log.p = FALSE) qweibull(p, shape, scale = 1, alpha = shape, beta = scale, lower.tail = TRUE, log.p = FALSE) rweibull(n, shape, scale = 1, alpha = shape, beta = scale)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
shape , scale
|
shape and scale parameters, the latter defaulting to 1. |
alpha , beta
|
alpha and beta parameters, the latter defaulting to 1. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are
|
The Weibull distribution with alpha
(or shape
)
parameter and
beta
(or scale
) parameter
has density given by
for .
The cumulative is
, the
mean is
, and
the
.
dweibull
gives the density,
pweibull
gives the distribution function,
qweibull
gives the quantile function, and
rweibull
generates random deviates.
The cumulative hazard
is
-pweibull(t, a, b, lower = FALSE, log = TRUE)
which is just
.
dexp
for the Exponential which is a special case of a
Weibull distribution.
x <- c(0,rlnorm(50)) all.equal(dweibull(x, alpha = 1), dexp(x)) all.equal(pweibull(x, alpha = 1, beta = pi), pexp(x, rate = 1/pi)) ## Cumulative hazard H(): all.equal(pweibull(x, 2.5, pi, lower=FALSE, log=TRUE), -(x/pi)^2.5, tol=1e-15) all.equal(qweibull(x/11, alpha = 1, beta = pi), qexp(x/11, rate = 1/pi))
x <- c(0,rlnorm(50)) all.equal(dweibull(x, alpha = 1), dexp(x)) all.equal(pweibull(x, alpha = 1, beta = pi), pexp(x, rate = 1/pi)) ## Cumulative hazard H(): all.equal(pweibull(x, 2.5, pi, lower=FALSE, log=TRUE), -(x/pi)^2.5, tol=1e-15) all.equal(qweibull(x/11, alpha = 1, beta = pi), qexp(x/11, rate = 1/pi))
The data set wire
contains the results of an experiment to study
the relationship between the resistance of a wire and its gauge and its
length.
A data frame with 27 observations on the following 3 variables.
: wire gauge (in AWG units)
: wire length (in feet)
: resistance (in ohms)
Adds vertical lines in the plot region.
xline(x, ...)
xline(x, ...)
x |
Values on x axis specifying location of vertical lines. |
... |
Any ploting options for abline. |
yline, abline
plot( 1:10) xline( 6.5, col=2)
plot( 1:10) xline( 6.5, col=2)
The data set yarnred
is from a team project to investigate the effect
of count number and yarn type on the tensile strength of yarn. The Uster
Tensorapid 3, a standard machine for measuring tensile strength, was used
to take 10 measurements of strength from each of 6 yarn cones (one of each
type and count, 10 feet between measurements on each cone).
A data frame with 60 observations on the following 3 variables.
: thread count (24 or 36)
: yarn type (AirJet, OpenEnd or Ring)
: tensile strength
The data set yarnred
is from a team project to investigate the effect
of count number and yarn type on the tensile strength of yarn. The Uster
Tensorapid 3, a standard machine for measuring tensile strength, was used
to take 10 measurements of strength from each of 6 yarn cones (one of each
type and count, 10 feet between measurements on each cone). The 10
measurements for each cone are not true replications because the yarn cones
should be the experimental unit. Thus we have reduced the data by averaging
over the 10 measurements for each cone.
A data frame with 6 observations on the following 4 variables.
: thread count (24 or 36)
: yarn type (AirJet, OpenEnd or Ring)
: tensile strength (average of 10 measurements)
: standard deviation amoung the 10 measurements
Adds horizontal lines in the plot region.
yline(y, ...)
yline(y, ...)
y |
Values on y axis specifying location of vertical lines. |
... |
Any ploting options for abline. |
xline, abline
plot( 1:10) yline( 4.0, col=3)
plot( 1:10) yline( 4.0, col=3)