| Title: | Longitudinal Sports Analytics Asset and Workload Feature Processing |
|---|---|
| Description: | A synthetic, longitudinal athletic dataset generated through a transparent, rule-based simulation engine. Captures individual activity sessions across multiple athletes, environmental conditions, and physiological responses. Specifically designed as an alternative to legacy teaching datasets by introducing realistic hierarchical repeated measures, complex two-way covariate interactions, and a deliberate Missing Not At Random (MNAR) tracking mechanism suitable for advanced imputation workflows. Methodologies implemented are based on van Buuren (2018) <doi:10.1201/9780429492259> and Bates et al. (2015) <doi:10.18637/jss.v067.i01>. |
| Authors: | Mohammad Abbas [aut, cre] |
| Maintainer: | Mohammad Abbas <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-30 21:37:27 UTC |
| Source: | https://github.com/cran/sportsfeatures |
A convenient helper function to quickly load and return the package's internal sports features data assets directly into an active variable.
get_sportsdata(type = c("complete", "missing"))get_sportsdata(type = c("complete", "missing"))
type |
A character string specifying which dataset variant to load:
|
A tibble/data.frame containing the requested sports feature dataset.
# Get the clean complete dataset clean_data <- get_sportsdata(type = "complete") # Get the dataset containing systematic missingness missing_data <- get_sportsdata(type = "missing")# Get the clean complete dataset clean_data <- get_sportsdata(type = "complete") # Get the dataset containing systematic missingness missing_data <- get_sportsdata(type = "missing")
Comprehensive Sports Features Dataset
sports_featuressports_features
A tibble or data frame with 25 variables describing athlete sessions and performance metrics:
Unique alphanumeric identifier for each training session.
Unique alphanumeric identifier for each athlete.
Timestamp of when the training session occurred.
Type of exercise performed (e.g., running, cycling, swimming).
Geographical area where the session took place.
Total distance covered during the session in kilometers.
Weather condition during the session (e.g., sunny, rainy, cloudy).
Ambient outdoor temperature in degrees Celsius.
Pre-activity physical or mental status reported by the athlete.
Logical indicator (TRUE/FALSE) if the session was done with a group.
Categorical gender of the athlete.
Age of the athlete in years.
Baseline fitness score of the athlete.
Baseline average speed capability of the athlete.
Baseline stamina level of the athlete.
Baseline body weight of the athlete in kilograms.
Baseline resting heart rate in beats per minute (bpm).
Type of tracking device used during the session.
Average speed maintained throughout the session in km/h.
Total duration of the training session in minutes.
Average heart rate monitored during the session in bpm.
Estimated total energy expenditure in kilocalories (kcal).
Subjective exhaustion level reported after the session.
Hydration level (%) recorded during or after the session.
Calculated post-activity fatigue accumulation score.
A rich, synthetic sports analytics dataset containing tracking metrics, environmental contexts, physiological markers, and performance data for athletes.
Synthesized sports features analytics framework.
library(tidyverse) library(lme4) # Load the package data data("sports_features") # Downsample data for the example to ensure fast execution time (< 2.5s) demo_data <- head(sports_features, 500) # ---------------------------------------------------- # DEMO 1: Linear Regression (Fixed Effects) # Predicting fatigue score based on workload metrics # ---------------------------------------------------- lm_model <- lm(fatigue_score ~ distance_km + duration_min + speed_kmh + temperature_c, data = demo_data) summary(lm_model) # ---------------------------------------------------- # DEMO 2: Linear Mixed-Effects Model (Hierarchical MML) # Controlling for variation across individual athletes (athlete_id) # ---------------------------------------------------- mml_model <- lmer(fatigue_score ~ distance_km + duration_min + speed_kmh + temperature_c + (1 | athlete_id), data = demo_data) summary(mml_model)library(tidyverse) library(lme4) # Load the package data data("sports_features") # Downsample data for the example to ensure fast execution time (< 2.5s) demo_data <- head(sports_features, 500) # ---------------------------------------------------- # DEMO 1: Linear Regression (Fixed Effects) # Predicting fatigue score based on workload metrics # ---------------------------------------------------- lm_model <- lm(fatigue_score ~ distance_km + duration_min + speed_kmh + temperature_c, data = demo_data) summary(lm_model) # ---------------------------------------------------- # DEMO 2: Linear Mixed-Effects Model (Hierarchical MML) # Controlling for variation across individual athletes (athlete_id) # ---------------------------------------------------- mml_model <- lmer(fatigue_score ~ distance_km + duration_min + speed_kmh + temperature_c + (1 | athlete_id), data = demo_data) summary(mml_model)
Comprehensive Sports Features Dataset (With Missing Values)
sports_features_missingsports_features_missing
A tibble or data frame with 25 variables containing structured missing values.
A variant of the core sports analytics dataset containing structured missingness (NA values) across performance tracking columns to demonstrate imputation workflows.
Synthesized sports features analytics framework.