Title: | Collection of Data Sets for Teaching Purposes |
---|---|
Description: | Collection (syllogi in greek) of real and fictitious data sets for teaching purposes. The datasets were manually entered by the author from the respective references as listed in the individual dataset documentation. The fictions datasets are the creation of the author, that he has found useful for teaching statistics. |
Authors: | Jared Studyvin [aut, cre] |
Maintainer: | Jared Studyvin <[email protected]> |
License: | Apache License (>= 2) |
Version: | 1.0.3 |
Built: | 2024-12-06 06:27:42 UTC |
Source: | CRAN |
Data.frame
data(alligatorDiet)
data(alligatorDiet)
The data frame has 16 rows and 8 variables:
Lake in Florida of the capture of the aligator.
Female (F) or Male (M).
small (<=2.3m) or big (> 2.3m).
Number of alligators with a primary stomach contents of fish.
Number of alligators with a primary stomach contents of invertebrate.
Number of alligators with a primary stomach contents of reptile.
Number of alligators with a primary stomach contents of bird.
Number of alligators with a primary stomach contents of other.
A study done at four lakes in Florida captured 219 alligators. The primary food type found in the alligator's stomach is recorded. Along with the gender, lake of capture, and size of the alligator.
Agresti, A. (2013) Categorical Data Analysis. 3rd Edition, John Wiley & Sons, Hoboken, New Jersey.
data("alligatorDiet", package='syllogi') str(alligatorDiet)
data("alligatorDiet", package='syllogi') str(alligatorDiet)
Data.frame
data(alligatorLength)
data(alligatorLength)
The data frame has 63 rows and 3 variables:
Female (F) or Male (M).
Length of alligator in meters. Subadult alligators have length < 1.83 and adults if > 1.83 meters.
Primary stomach contents of the alligator.
A study in Lake George, Florida caught 63 alligators. Each alligator's stomach contents were classified as fish, invertebrate, or other. The sex and the length of the alligator were also recorded.
Agresti, A. (2013) Categorical Data Analysis. 3rd Edition, John Wiley & Sons, Hoboken, New Jersey.
data("alligatorLength", package='syllogi') str(alligatorLength)
data("alligatorLength", package='syllogi') str(alligatorLength)
Data.frame
data(annualSales)
data(annualSales)
The data frame has 12 rows and 3 variables:
Annual gross sales in $1000 of dollars.
Annual cost of advertising in $1000 of dollars.
Quality of their store\'s typical product: 0=very poor quality to 25 = exceptional quality.
You are hired as a statistical consultant. Twelve stores in the Fort Collins, CO area have asked you to develop a prediction model for their annual gross sales (sales; measured in $1000 of dollars). They would like to know if it is possible to predict the amount of their sales by knowing how much they spend annually on advertising (advert; measured in $1000 of dollars) and the quality of their store’s typical product (quality; measure on a scale from 0 = very poor quality to 25 = exceptional quality).
fictitious data set
data("annualSales", package='syllogi') str(annualSales)
data("annualSales", package='syllogi') str(annualSales)
Bighorn Sheep data
data(bighornSheep)
data(bighornSheep)
The data frame has 8000 rows (a geographic sample unit) and 15 variables:
Sample unit ID, 150m circles randomly overlayed across the study area
Count of use by bighorn sheep.
Average slope (degrees) within the sampling unit
Average elevation (m) within the sampling unit
Sampling unit center to nearest (m) burned habitat edge calculated after fire event
Sampling unit center to nearest (m) road
Sampling unit center to nearest (m) escape terrain (slope > 27 degrees)
Sampling unit center to nearest (m) perennial water source
Dominant cardinal direction within each sampling unit
1 = after fire, 0 = before fire
Season, summer or winter
Twelve female bighorn sheep are radio collared and tracked. Location of use of points is recorded before and after a forest fire.
Clapp, J.G., Beck, J.L. Short-Term Impacts of Fire-Mediated Habitat Alterations on an Isolated Bighorn Sheep Population. fire ecol 12, 80–98 (2016). https://doi.org/10.4996/fireecology.1203080
data('bighornSheep', package='syllogi') str(bighornSheep)
data('bighornSheep', package='syllogi') str(bighornSheep)
Data.frame
data(bladderCancer)
data(bladderCancer)
The data frame has 31 rows and 3 variables:
0 = small primary tumor (< 3 cm) and 1 = large primary tumor (> 3cm).
Number of tumors.
Follow up time in months.
Study on recurrence of bladder cancer tumor patients. Each patient had perviously received surgery to remove a primary tumor. The size of the primary removed tumor was recorded. After different follow up times the number of recurring tumors were recorded.
Bilder, C.R., & Loughin, T.M. (2014). Analysis of Categorical Data with R (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b17211
data("bladderCancer", package='syllogi') str(bladderCancer)
data("bladderCancer", package='syllogi') str(bladderCancer)
Data.frame
data(butterflyPlot)
data(butterflyPlot)
The data frame has 40 rows and 2 variables:
Plot area size in hectares.
Count of number of unique species.
Plots ranging in size from 1ha to 1000ha, were left uncut in a larger landscape of logged tropical rainforest. In each plot the number of unique butterfly species was recorded. What is the relationship between plot size and unique species count?
fictitious data set
data("butterflyPlot", package='syllogi') str(butterflyPlot)
data("butterflyPlot", package='syllogi') str(butterflyPlot)
Self reported level of depression and other associated metrics.
data(depression)
data(depression)
An object of class data.frame
with 50 rows and 13 columns.
This is a fictious dataset useful for teaching how to use and interpret linear statistical models. The variables are:
Level of Education: (1) professional degree (non-college), (2) 2 years of college, (3) 2+ years of college, but not a BS degree, (4) BS degree, (5) MS degree
Annual Income: 1 = $10,0001 to $19,999; 2 = $20,000 to $29,999; ... 9 = $90,000 to $99,999; 10 = $100,000 or more
Experience of Trauma; Percent of Life Events Viewed as Traumatic: 0 = 0%, 1 = 10%, 2= 20%, ..., 9 = 90%, 10 = 100%
Satisfied with your Life: 0 = No, 1 = Yes
Feeling of Control; How much do you feel in control: 0 = Not at all, 1 = A Little, 2 = Some, 3 = A Lot, 4 = Completely
Family History of Depression: 0 = No, 1 = Yes
Weekly Amount of Exercise: 0 = None, 1 = 1 Hour, 2 = 2 Hours, 3 = 3 Hours, 4 = 4 Hours, 5 = 5 or more Hours
3-methoxy-4-hydroxyphenylethyleneglycol, Depression Related Chemical Secreted in Urine; milligrams secreted per 24 hour period, labeled as mg/24h
: 0 = 0 mg/24h
, 1 = 100 mg/24h
,..., 9 = 900 mg/24h
, 10 = 1000+ mg/24h
Amount of Sleep Problems: 0 = None, 1 = 10% of the time, ... , 9 = 90% of the time, 10 = 100% of the time
Perceived Level of Depression: 0 = None, 1 = 10% of the time, ... , 9 = 90% of the time, 10 = 100% of the time
Do I consider myself depressed: 0 = No, 1 = Yes
Feeling of Well Being; how often do you feel good about yourself: 0 = None, 1 = 10% of the time, ... , 9 = 90% of the time, 10 = 100% of the time
Your Sex: 0 = Male, 1 = Female
fictitious data set
Data.frame
data(dogFood)
data(dogFood)
The data frame has 25 rows and 2 variables:
The type of dog food: our dog food or one of the four top sellers.
The percent weight gain.
You are hired as a statistical consultant for a dog food manufacturing company. The engineers who designed the company's dog food would like to know how it compares to the current top selling dog food brands on the market? To answer this question, 25 puppies of the same breed and age (within a week of each other) were chosen for this study. Five puppies were assigned to each dog food type. After 4 weeks the percent of weight gained for each puppy was determined.
fictitious data set
data("dogFood", package='syllogi') str(dogFood)
data("dogFood", package='syllogi') str(dogFood)
List of the Federalist Papers
data(federalistPapers)
data(federalistPapers)
The list has 86 elements, each element is a list with 2 elements. The paper element is the text of the paper. The meta element is a data frame:
Paper number.
Author of the paper.
Title of the paper.
Newpaper that published the paper.
Date of publication.
The Project Gutenberg version of the Federalist Papers attributes paper No. 58 to Madison, but Mosteller and Wallace consider this paper to have disputed authorship. Thus, this version considers No. 58 authorship to be disputed.
The Project Gutenberg has two slightly different versions of No. 70, both included.
https://www.gutenberg.org/ebooks/18
Mosteller, F. and D. L. Wallace. Inference and Disputed Authorship: The Federalist. Reading, MA., 1964
data("federalistPapers", package='syllogi') str(federalistPapers)
data("federalistPapers", package='syllogi') str(federalistPapers)
Generic data set with four ratio predictors (X1,X2,X3,X4), two categorical predictors (A,B) and one ratio response variable (Y).
data(genericData)
data(genericData)
An object of class data.frame
with 60 rows and 7 columns.
This is a fictious dataset useful for teaching how to use and interpret linear statistical models.
fictitious data set
data("genericData", package='syllogi') str(genericData)
data("genericData", package='syllogi') str(genericData)
Data.frame
data(nutritionCancer)
data(nutritionCancer)
The data frame has 50 rows and 6 variables:
ID number of each patient.
The age of the patient in years.
The duration or time in months the patient has had breast cancer.
The number of servings the patient eats of fruits and vegetables in a typical day.
Does or did any blood relatives (i.e. mother, grandmother, aunt, etc.) have or had breast cancer?
The stage of the cancer: 0-non-invasive to IV-very invasive or "advanced" cancer.
Fictitious data set for teaching purposes. The fictitious scenario:
The purpose of a medical study is to examine the relationship between eating fruits and vegetables and breast cancer. To study the relationship, 1500 caucasian women with breast cancer were randomly selected from the list of cancer patients in the U.S. The first 50 patients have been measured.
Fictitious data set
data("nutritionCancer", package='syllogi') str(nutritionCancer)
data("nutritionCancer", package='syllogi') str(nutritionCancer)
Data.frame
data(osteosarcoma)
data(osteosarcoma)
The data frame has 8 rows and 5 variables:
Patient has high or low lymphocytic inflitration.
Female (F) or Male (M).
Patient has osteoblastic pathology yes or no.
Number of patients that are disease free after three years.
Number of patients that are not disease free after three years.
A study of nonmetastatic osteosarcoma was done. They recorded if the patient was disease free after three years. They recorded the gender, level of lymphocytic infiltration, and if there is osteoblastic pathology or not. Can the probability of being desease free after 3 years be predicted?
A M Goorin, A Perez-Atayde, M Gebhardt, J W Andersen, R H Wilkinson, M J Delorey, H Watts, M Link, N Jaffe, and E Frei 3rd Journal of Clinical Oncology 1987 5:8, 1178-1184
Agresti, A. (2002) Categorical Data Analysis. 2nd Edition, John Wiley & Sons, Inc., New York, 320-332. http://dx.doi.org/10.1002/0471249688
data("osteosarcoma", package='syllogi') str(osteosarcoma)
data("osteosarcoma", package='syllogi') str(osteosarcoma)
Data.frame
data(patientSatisfaction)
data(patientSatisfaction)
The data frame has 46 rows and 4 variables:
Patient's level of satisfaction, higher value means more satisfied.
Patient's age in years.
Patient's severity of illness, higher value means more sever.
Patient's anxiety level, higher value means more sever.
A hospital administrator wants to predict patient's satisfaction using their age, severity of illness, and anxiety level. Forty six patients were selected for the study.
Kutner, M. H., Nachtsheim, C., Neter, J., & Li, W. (2005). Applied linear statistical models (5th ed.). McGraw-Hill Irwin.
data("patientSatisfaction", package='syllogi') str(patientSatisfaction)
data("patientSatisfaction", package='syllogi') str(patientSatisfaction)
Data.frame
data(politicalIdeology)
data(politicalIdeology)
The data frame has 20 rows and 4 variables:
Female (F) or Male (M).
Democrat (D) or Republican (R)
Very liberal (VL), Slightly Liberal (SL), Moderate (M), Slightly conservative (SC), or Very conservative (VC).
Count of people.
A 1991 U.S. General Social survey that cross classifies people according to gender, political party, and political ideology.
Bilder, C.R., & Loughin, T.M. (2014). Analysis of Categorical Data with R (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b17211
data("politicalIdeology", package='syllogi') str(politicalIdeology)
data("politicalIdeology", package='syllogi') str(politicalIdeology)
A survey conducted on high school seniors by the National Center of Education Statistics.
data(schoolProgram)
data(schoolProgram)
The data frame has 200 rows (a student) and 11 variables:
Student ID.
Student's gender.
Student's race.
Socio economic status of the student's family, with levels low, middle, and high.
Type of school: public or private.
Type of program the student wants to attend after high school.
Student's standardized reading score.
Student's standardized writing score.
Student's standardized math score.
Student's standardized science score
Student's standardized social studies score
Two hundred students were randomly selected from the whole cohort in the survey.
https://www.openintro.org/data/index.php?data=hsb2
UCLA Institute for Digital Research & Education - Statistical Consulting.
data("schoolProgram", package='syllogi') str(schoolProgram)
data("schoolProgram", package='syllogi') str(schoolProgram)
Data.frame
data(shipGold)
data(shipGold)
The data frame has 20 rows (a ship) and 2 variables:
Size of the ship measured in inches on the horizon.
Amount of gold pieces on the ship.
Fictitious data set for teaching purposes. The fictitious scenario:
Captain Buck Tooth has taken you prisoner aboard his pirate ship, the Lucky Lemon. He sees from your college transcripts you have taken a couple of statistics courses. Captain Buck Tooth wants you to predict the amount of gold a ship is carrying based on the size of the ship. Specifically, he thinks bigger ships carry more gold. For the last several ships he has looted he measured the height in inches when the ship was still way off on the horizon. The captain also has a good memory and remembers how much gold was taken from each ship in number of pieces.
Fictitious data set
data("shipGold", package='syllogi') str(shipGold)
data("shipGold", package='syllogi') str(shipGold)
Data.frame
data(weightLoss)
data(weightLoss)
The data frame has 60 rows and 2 variables:
Which weight loss drug the participant took for 6 weeks.
Percent of weight loss after the 6 weeks.
Fictitious data set for teaching purposes. The fictitious scenario:
You are a statistical consultant. A client comes to you asking for help with their analysis. The client is from a drug company. Their new drug is supposed to help people lose weight. They conducted an experiment with their drug (drug A) and the two best selling weight loss drugs (B and C). Male participants from age 50-60 were used in the study. Each participant took one of the drugs for 6 week and the percent of weight loss was recorded.
Fictitious data set
data("weightLoss", package='syllogi') str(weightLoss)
data("weightLoss", package='syllogi') str(weightLoss)
Data.frame
data(wheat)
data(wheat)
The data frame has 275 rows and 7 variables:
hrw = hard red winter wheat and srw = soft red winter wheat.
Density of a kernel.
Hardness of a kernel.
Size of a kernel.
Weight of a kernel.
Moisture content of a kernel.
Kernel's condition: Healthy, Sprout (sprouted prematurely), or Scab (infected with a fungus).
A study on kernels of wheat was done. There are two classes of wheat: hard and soft red winter wheat. Each kernel measured for density, hardness, size, weight, and moisture content. Each kernel was classified by visual inspection if healthy, sprouted, or scab. A row in the data frame represents a kernel of wheat.
Bilder, C.R., & Loughin, T.M. (2014). Analysis of Categorical Data with R (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b17211
Martin, C., Herrman, T.J., Loughin, T. and Oentong, S. (1998), Micropycnometer Measurement of Single-Kernel Density of Healthy, Sprouted, and Scab\-Damaged Wheats†. Cereal Chemistry, 75: 177-180. https://doi-org.libproxy.uwyo.edu/10.1094/CCHEM.1998.75.2.177
data("wheat", package='syllogi') str(wheat)
data("wheat", package='syllogi') str(wheat)