Title: | Extract Data from NCAA Women's and Men's Volleyball Website |
---|---|
Description: | Extracts team records/schedules and player statistics for the 2020-2024 National Collegiate Athletic Association (NCAA) women's and men's divisions I, II, and III volleyball teams from <https://stats.ncaa.org>. Functions can aggregate statistics for teams, conferences, divisions, or custom groups of teams. |
Authors: | Jeffrey R. Stevens [aut, cre, cph] |
Maintainer: | Jeffrey R. Stevens <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.1 |
Built: | 2025-01-24 02:09:42 UTC |
Source: | CRAN |
This is a wrapper around group_stats()
that extracts season, match, or pbp
data from players in all teams in the chosen conference. For season stats,
it aggregates all player data and team data into separate data frames and
combines them into a list. For match and pbp stats, it aggregates into a
data frame.
Conferences names can be found in
ncaa_conferences.
conference_stats( year = NULL, conf = NULL, level = NULL, sport = "WVB", save = FALSE, path = "." )
conference_stats( year = NULL, conf = NULL, level = NULL, sport = "WVB", save = FALSE, path = "." )
year |
Numeric vector of years for fall of desired seasons. |
conf |
NCAA conference name. |
level |
Character string defining whether to aggregate "season", "match", or play-by-play ("pbp") data. |
sport |
Three letter abbreviation for NCAA sport (must be upper case; for example "WVB" for women's volleyball and "MVB" for men's volleyball). |
save |
Logical for whether to save the statistics locally as CSVs (default FALSE). |
path |
Character string of path to save statistics files. |
For season level, returns list with data frames of player statistics and team statistics. For match and pbp levels, returns data frame of player statistics and play-by-play information respectively.
Other functions that aggregate statistics:
division_stats()
,
group_stats()
conference_stats(year = 2024, conf = "Peach Belt", level = "season")
conference_stats(year = 2024, conf = "Peach Belt", level = "season")
This is a wrapper around group_stats()
that extracts season, match, or pbp
data from players in all teams in the chosen division. For season stats,
it aggregates all player data and team data into separate data frames and
combines them into a list. For match and pbp stats, it aggregates into a
data frame.
division_stats( year = NULL, division = 1, level = NULL, sport = "WVB", save = FALSE, path = "." )
division_stats( year = NULL, division = 1, level = NULL, sport = "WVB", save = FALSE, path = "." )
year |
Numeric vector of years for fall of desired seasons. |
division |
NCAA division (must be 1, 2, or 3). |
level |
Character string defining whether to aggregate "season", "match", or play-by-play ("pbp") data. |
sport |
Three letter abbreviation for NCAA sport (must be upper case; for example "WVB" for women's volleyball and "MVB" for men's volleyball). |
save |
Logical for whether to save the statistics locally as CSVs (default FALSE). |
path |
Character string of path to save statistics files. |
For season level, returns list with data frames of player statistics and team statistics. For match and pbp levels, returns data frame of player statistics and play-by-play information respectively.
Other functions that aggregate statistics:
conference_stats()
,
group_stats()
NCAA datasets use a unique ID for each sport, team, season, and match. This function returns a data frame of dates, opponent team names, and contest IDs for each NCAA contest (volleyball match) for each team and season.
find_team_contests(team_id = NULL)
find_team_contests(team_id = NULL)
team_id |
Team ID determined by NCAA for season. To find ID, use
|
Returns a data frame that includes date, team, opponent, and contest ID for each season's contest.
find_team_contests(team_id = "585290")
find_team_contests(team_id = "585290")
NCAA datasets use a unique ID for each team and season. To access a team's
data, we must know the volleyball team ID. This function looks up the team ID
from wvb_teams or mvb_teams using the team name.
Team names can be found in ncaa_teams or searched with
find_team_name()
.
find_team_id(team = NULL, year = NULL, sport = "WVB")
find_team_id(team = NULL, year = NULL, sport = "WVB")
team |
Name of school. Must match name used by NCAA. Find exact team
name with |
year |
Numeric vector of years for fall of desired seasons. |
sport |
Three letter abbreviation for NCAA sport (must be upper case; for example "WVB" for women's volleyball and "MVB" for men's volleyball). |
Returns a character string of team ID.
Other search functions:
find_team_name()
find_team_id(team = "Nebraska", year = 2024) find_team_id(team = "UCLA", year = 2023, sport = "MVB")
find_team_id(team = "Nebraska", year = 2024) find_team_id(team = "UCLA", year = 2023, sport = "MVB")
This is a convenience function to find NCAA team names in
ncaa_teams. Once the proper team name is found, it can be
passed to find_team_id()
or group_stats()
.
find_team_name(pattern = NULL)
find_team_name(pattern = NULL)
pattern |
Character string of pattern you want to find in the vector of team names. |
Returns a character vector of team names that include the submitted pattern.
Other search functions:
find_team_id()
find_team_name(pattern = "Neb")
find_team_name(pattern = "Neb")
NCAA datasets use a unique ID for each sport, team, and season. This function extracts team names, IDs, and conferences for each NCAA team in a division. However, you should not need to use this function for volleyball data from 2020-2024, as it has been used to generate wvb_teams and mvb_teams. However, it is available to use for other sports, using the appropriate three letter sport code drawn from ncaa_sports (e.g., men's baseball is "MBA").
get_teams(year = NULL, division = 1, sport = "WVB")
get_teams(year = NULL, division = 1, sport = "WVB")
year |
Single numeric year for fall of desired season. |
division |
NCAA division (must be 1, 2, or 3). |
sport |
Three letter abbreviation for NCAA sport (must be upper case; for example "WVB" for women's volleyball and "MVB" for men's volleyball). |
Returns a data frame of all teams, their team ID, division, conference, and season.
This function is a modification of the ncaa_teams()
function from the
{baseballr}
package.
This function aggregates player statistics and play-by-play information
within a season by applying player_season_stats()
, player_match_stats()
,
or match_pbp()
across groups of teams (for player_season_stats()
) or
across contests within a season (for player_match_stats()
and
match_pbp()
). For season stats, it aggregates all player data and team
data into separate data frames and combines them into a list.
For instance, if you want to extract the data from the teams in the women's
2024 Final Four, pass a vector of
c("Louisville", "Nebraska", "Penn State", "Pittsburgh")
to the function. For match or play-by-play data for a team, pass a single
team name and year. Team names can be found in ncaa_teams or by
using find_team_name()
.
group_stats( teams = NULL, year = NULL, level = "season", unique = TRUE, sport = "WVB" )
group_stats( teams = NULL, year = NULL, level = "season", unique = TRUE, sport = "WVB" )
teams |
Character vector of team names to aggregate. |
year |
Numeric vector of years for fall of desired seasons. |
level |
Character string defining whether to aggregate "season", "match", or play-by-play ("pbp") data. |
unique |
Logical indicating whether to only process unique contests (TRUE) or whether to process duplicated contests (FALSE). Default is TRUE. |
sport |
Three letter abbreviation for NCAA sport (must be upper case; for example "WVB" for women's volleyball and "MVB" for men's volleyball). |
For season level, returns list with data frames of player statistics and team statistics. For match and pbp levels, returns data frame of player statistics and play-by-play information respectively.
Other functions that aggregate statistics:
conference_stats()
,
division_stats()
group_stats(teams = c("Louisville", "Nebraska", "Penn St.", "Pittsburgh"), year = 2024, level = "season")
group_stats(teams = c("Louisville", "Nebraska", "Penn St.", "Pittsburgh"), year = 2024, level = "season")
The NCAA's page for a match/contest includes a tab called "Play By Play". This function extracts the tables of play-by-play information for each set.
match_pbp(contest = NULL)
match_pbp(contest = NULL)
contest |
Contest ID determined by NCAA for match. To find ID, use
|
Returns a data frame of set number, teams, score, event, and player responsible for the event.
match_pbp(contest = "6080706")
match_pbp(contest = "6080706")
This data frame includes all men's NCAA Division 1 and 3 teams from 2020-2024.
mvb_teams
mvb_teams
A data frame with 873 rows and 6 columns:
Team ID for season/year
Team name
Conference ID
Conference name
NCAA division number (1 or 3)
Year for fall of season
Other data sets:
ncaa_conferences
,
ncaa_sports
,
ncaa_teams
,
wvb_teams
head(mvb_teams)
head(mvb_teams)
This vector includes names for all NCAA volleyball conferences.
ncaa_conferences
ncaa_conferences
A character vector with 111 conference names.
Other data sets:
mvb_teams
,
ncaa_sports
,
ncaa_teams
,
wvb_teams
head(ncaa_conferences)
head(ncaa_conferences)
This data frame includes all NCAA women's and men's sports and the codes used to refer to the sports.
ncaa_sports
ncaa_sports
A data frame with 100 rows and 2 columns:
Sport code
Sport name
https://ncaaorg.s3.amazonaws.com/championships/resources/common/NCAA_SportCodes.pdf
Other data sets:
mvb_teams
,
ncaa_conferences
,
ncaa_teams
,
wvb_teams
head(ncaa_sports)
head(ncaa_sports)
This vector includes names for all NCAA volleyball teams.
ncaa_teams
ncaa_teams
A character vector with 1,089 team names.
Other data sets:
mvb_teams
,
ncaa_conferences
,
ncaa_sports
,
wvb_teams
head(ncaa_teams)
head(ncaa_teams)
The NCAA's page for a match/contest includes a tab called "Individual Statistics". This function extracts the tables of player match statistics for both home and away teams, as well as team statistics (though these can be omitted). If a particular team is specified, only that team's statistics will be returned.
player_match_stats( contest = NULL, team = NULL, team_stats = TRUE, sport = "WVB" )
player_match_stats( contest = NULL, team = NULL, team_stats = TRUE, sport = "WVB" )
contest |
Contest ID determined by NCAA for match. To find ID, use
|
team |
Name of school. Must match name used by NCAA. Find exact team
name with |
team_stats |
Logical indicating whether to include (TRUE) or exclude (FALSE) team statistics. Default includes team statistics with player statistics. |
sport |
Three letter abbreviation for NCAA sport (must be upper case; for example "WVB" for women's volleyball and "MVB" for men's volleyball). |
By default, returns data frame that includes both home and away team match statistics. If team is specified, only that team's data are returned.
Other functions that extract player statistics:
player_season_stats()
player_match_stats(contest = "6080706")
player_match_stats(contest = "6080706")
The NCAA's main page for a team includes a tab called "Team Statistics". This function extracts the table of player statistics for the season, as well as team and opponent statistics (though these can be omitted).
player_season_stats(team_id, team_stats = TRUE)
player_season_stats(team_id, team_stats = TRUE)
team_id |
Team ID determined by NCAA for season. To find ID, use
|
team_stats |
Logical indicating whether to include (TRUE) or exclude (FALSE) team statistics. Default includes team statistics with player statistics. |
Returns a data frame of player statistics. Note that hometown and high school were added in 2024.
Other functions that extract player statistics:
player_match_stats()
player_season_stats(team_id = "585290")
player_season_stats(team_id = "585290")
The NCAA's main page for a team includes a tab called "Game By Game" and a section called "Game by Game Stats". This function extracts the team's summary statistics for each match of the season.
team_match_stats(team_id = NULL, opponent = FALSE, sport = "WVB")
team_match_stats(team_id = NULL, opponent = FALSE, sport = "WVB")
team_id |
Team ID determined by NCAA for season. To find ID, use
|
opponent |
Logical indicating whether to include team's stats (FALSE) or opponent's stats (TRUE). Default is set to FALSE, returning team stats. |
sport |
Three letter abbreviation for NCAA sport (must be upper case; for example "WVB" for women's volleyball and "MVB" for men's volleyball). |
Returns a data frame of summary team statistics for each match of the season.
Other functions that extract team statistics:
team_season_info()
,
team_season_stats()
team_match_stats(team_id = "585290")
team_match_stats(team_id = "585290")
The NCAA's main page for a team includes a tab called "Schedule/Results".
This function extracts information about the team's venue, coach, and
records, as well as the table of the schedule and results. This returns a
list, so you can subset specific components with $
(e.g., for coach
information from an object called output
, use output$coach
).
team_season_info(team_id = NULL)
team_season_info(team_id = NULL)
team_id |
Team ID determined by NCAA for season. To find ID, use
|
Returns a list that includes arena, coach, schedule, and record information.
Other functions that extract team statistics:
team_match_stats()
,
team_season_stats()
team_season_info(team_id = "585290")
team_season_info(team_id = "585290")
The NCAA's main page for a team includes a tab called "Game By Game" and a section called "Career Totals". Though the page only shows one season's worth of information, this function extracts season summary stats starting with 2001. We have included the conference starting with 2020 (conference data for previous seasons is not currently available).
team_season_stats(team = NULL, opponent = FALSE, sport = "WVB")
team_season_stats(team = NULL, opponent = FALSE, sport = "WVB")
team |
Name of school. Must match name used by NCAA. Find exact team
name with |
opponent |
Logical indicating whether to include team's stats (FALSE) or opponent's stats (TRUE). Default is set to FALSE, returning team stats. |
sport |
Three letter abbreviation for NCAA sport (must be upper case; for example "WVB" for women's volleyball and "MVB" for men's volleyball). |
Returns a data frame of summary team statistics for each season.
Other functions that extract team statistics:
team_match_stats()
,
team_season_info()
team_season_stats(team = "Nebraska")
team_season_stats(team = "Nebraska")
This data frame includes all women's NCAA Division 1, 2, and 3 teams from 2020-2024.
wvb_teams
wvb_teams
A data frame with 5,289 rows and 6 columns:
Team ID for season/year
Team name
Conference ID
Conference name
NCAA division number (1, 2, or 3)
Year for fall of season
Other data sets:
mvb_teams
,
ncaa_conferences
,
ncaa_sports
,
ncaa_teams
head(wvb_teams)
head(wvb_teams)