Title: | Standardized Git Repository Data |
---|---|
Description: | Obtain standardized data from multiple 'Git' services, including 'GitHub' and 'GitLab'. Designed to be 'Git' service-agnostic, this package assists teams with activities spread across various 'Git' platforms by providing a unified way to access repository data. |
Authors: | Maciej Banas [aut, cre], Kamil Koziej [aut], Karolina Marcinkowska [aut], Matt Secrest [aut] |
Maintainer: | Maciej Banas <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.2.2 |
Built: | 2025-02-11 13:29:48 UTC |
Source: | CRAN |
GitStats
objectCreate a GitStats
object
create_gitstats()
create_gitstats()
A GitStats
object.
my_gitstats <- create_gitstats()
my_gitstats <- create_gitstats()
List all commits from all repositories for an organization or a vector of repositories.
get_commits( gitstats, since = NULL, until = Sys.Date() + lubridate::days(1), cache = TRUE, verbose = is_verbose(gitstats), progress = verbose )
get_commits( gitstats, since = NULL, until = Sys.Date() + lubridate::days(1), cache = TRUE, verbose = is_verbose(gitstats), progress = verbose )
gitstats |
A GitStats object. |
since |
A starting date. |
until |
An end date. |
cache |
A logical, if set to |
verbose |
A logical, |
progress |
A logical, by default set to |
A data.frame.
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), repos = c("openpharma/DataFakeR", "openpharma/visR") ) %>% set_gitlab_host( token = Sys.getenv("GITLAB_PAT_PUBLIC"), orgs = "mbtests" ) get_commits(my_gitstats, since = "2018-01-01") ## End(Not run)
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), repos = c("openpharma/DataFakeR", "openpharma/visR") ) %>% set_gitlab_host( token = Sys.getenv("GITLAB_PAT_PUBLIC"), orgs = "mbtests" ) get_commits(my_gitstats, since = "2018-01-01") ## End(Not run)
Prepare statistics from the pulled commits data.
get_commits_stats( commits, time_aggregation = c("year", "month", "week", "day"), group_var )
get_commits_stats( commits, time_aggregation = c("year", "month", "week", "day"), group_var )
commits |
A |
time_aggregation |
A character, specifying time aggregation of statistics. |
group_var |
Other grouping variable to be passed to |
To make function work, you need first to get commits data with
GitStats
. See examples section.
A table of commits_stats
class.
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), repos = c("r-world-devs/GitStats", "openpharma/visR") ) |> get_commits(my_gitstats, since = "2022-01-01") |> get_commits_stats( time_aggregation = "year", group_var = author ) ## End(Not run)
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), repos = c("r-world-devs/GitStats", "openpharma/visR") ) |> get_commits(my_gitstats, since = "2022-01-01") |> get_commits_stats( time_aggregation = "year", group_var = author ) ## End(Not run)
Pulls text files and their content.
get_files( gitstats, pattern = NULL, depth = Inf, file_path = NULL, cache = TRUE, verbose = is_verbose(gitstats), progress = verbose )
get_files( gitstats, pattern = NULL, depth = Inf, file_path = NULL, cache = TRUE, verbose = is_verbose(gitstats), progress = verbose )
gitstats |
A |
pattern |
A regular expression. If defined, it pulls content of all
files in a repository matching this pattern reaching to the level of
directories defined by |
depth |
Defines level of directories to retrieve files from. E.g. if set
to |
file_path |
A specific path to file(s) in repositories. May be a
character vector if multiple files are to be pulled. If defined, the
function pulls content of this specific |
cache |
A logical, if set to |
verbose |
A logical, |
progress |
A logical, by default set to |
get_files()
may be used in two ways: either with pattern
(with
optional depth
) or file_path
argument defined.
In the first scenario, GitStats
will pull first a files structure
responding to the passed pattern
and depth
arguments and afterwards
files content for all of these files. In the second scenario, GitStats
will pull only the content of files for the specific file_path
of the
repository.
If user wants to pull a particular file or files, a file_path
approach
seems more reasonable, as it is a faster way since it omits pulling the
whole file structure from the repo.
For example, if user wants to pull content of README.md
and/or NEWS.md
files placed in the root
directories of the repositories, he should take
the file_path
approach as he already knows precisely paths of the files.
On the other hand, if user wants to pull specific type of files (e.g. all
.md
or .Rmd
files in the repository), without knowing their path, it is
recommended to use a pattern
approach, which will trigger GitStats
to
find all the files in the repository on the given level of directories
(pattern
argument) and afterwards pull their content.
The latter approach is slower than the former but may be more useful
depending on users' goals. Both approaches return data in the same format:
tibble
with data on files
, namely their path
and their content
.
A data.frame.
## Not run: git_stats <- create_gitstats() |> set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs") ) |> set_gitlab_host( token = Sys.getenv("GITLAB_PAT_PUBLIC"), orgs = "mbtests" ) rmd_files <- get_files( gitstats = git_stats, pattern = "\\.Rmd", depth = 2L ) app_files <- get_files( gitstats = git_stats, file_path = c("R/app.R", "R/ui.R", "R/server.R") ) ## End(Not run)
## Not run: git_stats <- create_gitstats() |> set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs") ) |> set_gitlab_host( token = Sys.getenv("GITLAB_PAT_PUBLIC"), orgs = "mbtests" ) rmd_files <- get_files( gitstats = git_stats, pattern = "\\.Rmd", depth = 2L ) app_files <- get_files( gitstats = git_stats, file_path = c("R/app.R", "R/ui.R", "R/server.R") ) ## End(Not run)
Wrapper over searching repositories by code blobs related to
loading package (library(package)
and require(package)
in all files) or
using it as a dependency (package
in DESCRIPTION
and NAMESPACE
files).
get_R_package_usage( gitstats, packages, only_loading = FALSE, split_output = FALSE, cache = TRUE, verbose = is_verbose(gitstats) )
get_R_package_usage( gitstats, packages, only_loading = FALSE, split_output = FALSE, cache = TRUE, verbose = is_verbose(gitstats) )
gitstats |
A GitStats object. |
packages |
A character vector, names of R packages to look for. |
only_loading |
A boolean, if |
split_output |
Optional, a boolean. If |
cache |
A logical, if set to |
verbose |
A logical, |
A tibble
or list
of tibbles
depending on split_output
parameter.
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs", "openpharma") ) get_R_package_usage( gitstats = my_gitstats, packages = c("purrr", "shiny"), split_output = TRUE ) ## End(Not run)
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs", "openpharma") ) get_R_package_usage( gitstats = my_gitstats, packages = c("purrr", "shiny"), split_output = TRUE ) ## End(Not run)
Pull release logs from repositories.
get_release_logs( gitstats, since = NULL, until = Sys.Date() + lubridate::days(1), cache = TRUE, verbose = is_verbose(gitstats), progress = verbose )
get_release_logs( gitstats, since = NULL, until = Sys.Date() + lubridate::days(1), cache = TRUE, verbose = is_verbose(gitstats), progress = verbose )
gitstats |
A GitStats object. |
since |
A starting date. |
until |
An end date. |
cache |
A logical, if set to |
verbose |
A logical, |
progress |
A logical, by default set to |
A data.frame.
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs", "openpharma") ) get_release_logs(my_gistats, since = "2024-01-01") ## End(Not run)
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs", "openpharma") ) get_release_logs(my_gistats, since = "2024-01-01") ## End(Not run)
Pulls data on all repositories for an organization, individual
user or those with a given text in code blobs (with_code
parameter) or a
file (with_files
parameter) and parse it into table format.
get_repos( gitstats, add_contributors = TRUE, with_code = NULL, in_files = NULL, with_files = NULL, cache = TRUE, verbose = is_verbose(gitstats), progress = verbose )
get_repos( gitstats, add_contributors = TRUE, with_code = NULL, in_files = NULL, with_files = NULL, cache = TRUE, verbose = is_verbose(gitstats), progress = verbose )
gitstats |
A GitStats object. |
add_contributors |
A logical parameter to decide whether to add
information about repositories' contributors to the repositories output
(table). If set to |
with_code |
A character vector, if defined, GitStats will pull repositories with specified code phrases in code blobs. |
in_files |
A character vector of file names. Works when |
with_files |
A character vector, if defined, GitStats will pull repositories with specified files. |
cache |
A logical, if set to |
verbose |
A logical, |
progress |
A logical, by default set to |
A data.frame.
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs", "openpharma") ) %>% set_gitlab_host( token = Sys.getenv("GITLAB_PAT_PUBLIC"), orgs = "mbtests" ) get_repos(my_gitstats) get_repos(my_gitstats, add_contributors = FALSE) get_repos(my_gitstats, with_code = "Shiny", in_files = "renv.lock") get_repos(my_gitstats, with_files = "DESCRIPTION") ## End(Not run)
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs", "openpharma") ) %>% set_gitlab_host( token = Sys.getenv("GITLAB_PAT_PUBLIC"), orgs = "mbtests" ) get_repos(my_gitstats) get_repos(my_gitstats, add_contributors = FALSE) get_repos(my_gitstats, with_code = "Shiny", in_files = "renv.lock") get_repos(my_gitstats, with_files = "DESCRIPTION") ## End(Not run)
Pulls a vector of repositories URLs (web or API): either all for
an organization or those with a given text in code blobs (with_code
parameter) or a file (with_files
parameter).
get_repos_urls( gitstats, type = "api", with_code = NULL, in_files = NULL, with_files = NULL, cache = TRUE, verbose = is_verbose(gitstats), progress = verbose )
get_repos_urls( gitstats, type = "api", with_code = NULL, in_files = NULL, with_files = NULL, cache = TRUE, verbose = is_verbose(gitstats), progress = verbose )
gitstats |
A GitStats object. |
type |
A character, choose if |
with_code |
A character vector, if defined, |
in_files |
A character vector of file names. Works when |
with_files |
A character vector, if defined, GitStats will pull repositories with specified files. |
cache |
A logical, if set to |
verbose |
A logical, |
progress |
A logical, by default set to |
A character vector.
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs", "openpharma") ) %>% set_gitlab_host( token = Sys.getenv("GITLAB_PAT_PUBLIC"), orgs = "mbtests" ) get_repos_urls(my_gitstats, with_files = c("DESCRIPTION", "LICENSE")) ## End(Not run)
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs", "openpharma") ) %>% set_gitlab_host( token = Sys.getenv("GITLAB_PAT_PUBLIC"), orgs = "mbtests" ) get_repos_urls(my_gitstats, with_files = c("DESCRIPTION", "LICENSE")) ## End(Not run)
GitStats
storageRetrieves whole or particular data (see storage
parameter)
pulled earlier with GitStats
.
get_storage(gitstats, storage = NULL)
get_storage(gitstats, storage = NULL)
gitstats |
A GitStats object. |
storage |
A character, type of the data you want to get from storage:
|
A list of tibbles (if storage
set to NULL
) or a tibble (if
storage
defined).
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs", "openpharma") ) get_release_logs(my_gistats, since = "2024-01-01") get_repos(my_gitstats) release_logs <- get_storage( gitstats = my_gitstats, storage = "release_logs" ) ## End(Not run)
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs", "openpharma") ) get_release_logs(my_gistats, since = "2024-01-01") get_repos(my_gitstats) release_logs <- get_storage( gitstats = my_gitstats, storage = "release_logs" ) ## End(Not run)
Get users data
get_users(gitstats, logins, cache = TRUE, verbose = is_verbose(gitstats))
get_users(gitstats, logins, cache = TRUE, verbose = is_verbose(gitstats))
gitstats |
A GitStats object. |
logins |
A character vector of logins. |
cache |
A logical, if set to |
verbose |
A logical, |
A data.frame.
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs") ) %>% set_gitlab_host( token = Sys.getenv("GITLAB_PAT_PUBLIC"), orgs = "mbtests" ) get_users(my_gitstats, c("maciekabanas", "marcinkowskak")) ## End(Not run)
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( token = Sys.getenv("GITHUB_PAT"), orgs = c("r-world-devs") ) %>% set_gitlab_host( token = Sys.getenv("GITLAB_PAT_PUBLIC"), orgs = "mbtests" ) get_users(my_gitstats, c("maciekabanas", "marcinkowskak")) ## End(Not run)
Is verbose mode switched on
is_verbose(gitstats)
is_verbose(gitstats)
gitstats |
A GitStats object. |
Set GitHub host
set_github_host( gitstats, host = NULL, token = NULL, orgs = NULL, repos = NULL, verbose = is_verbose(gitstats), .error = TRUE )
set_github_host( gitstats, host = NULL, token = NULL, orgs = NULL, repos = NULL, verbose = is_verbose(gitstats), .error = TRUE )
gitstats |
A GitStats object. |
host |
A character, optional, URL name of the host. If not passed, a public host will be used. |
token |
A token. |
orgs |
An optional character vector of organisations. If you pass it,
|
repos |
An optional character vector of repositories full names
(organization and repository name, e.g. "r-world-devs/GitStats"). If you
pass it, |
verbose |
A logical, |
.error |
A logical to control if passing wrong input
( |
If you do not define orgs
and repos
, GitStats
will be set to
scan whole Git platform (such as enterprise version of GitHub or GitLab),
unless it is a public platform. In case of a public one (like GitHub) you
need to define orgs
or repos
as scanning through all organizations may
take large amount of time.
A GitStats
object with added information on host.
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( orgs = c("r-world-devs", "openpharma", "pharmaverse") ) ## End(Not run)
## Not run: my_gitstats <- create_gitstats() %>% set_github_host( orgs = c("r-world-devs", "openpharma", "pharmaverse") ) ## End(Not run)
Set GitLab host
set_gitlab_host( gitstats, host = NULL, token = NULL, orgs = NULL, repos = NULL, verbose = is_verbose(gitstats), .error = TRUE )
set_gitlab_host( gitstats, host = NULL, token = NULL, orgs = NULL, repos = NULL, verbose = is_verbose(gitstats), .error = TRUE )
gitstats |
A GitStats object. |
host |
A character, optional, URL name of the host. If not passed, a public host will be used. |
token |
A token. |
orgs |
An optional character vector of organisations. If you pass it,
|
repos |
An optional character vector of repositories full names
(organization and repository name, e.g. "r-world-devs/GitStats"). If you
pass it, |
verbose |
A logical, |
.error |
A logical to control if passing wrong input
( |
If you do not define orgs
and repos
, GitStats
will be set to
scan whole Git platform (such as enterprise version of GitHub or GitLab),
unless it is a public platform. In case of a public one (like GitHub) you
need to define orgs
or repos
as scanning through all organizations may
take large amount of time.
A GitStats
object with added information on host.
## Not run: my_gitstats <- create_gitstats() %>% set_gitlab_host( token = Sys.getenv("GITLAB_PAT_PUBLIC"), orgs = "mbtests" ) ## End(Not run)
## Not run: my_gitstats <- create_gitstats() %>% set_gitlab_host( token = Sys.getenv("GITLAB_PAT_PUBLIC"), orgs = "mbtests" ) ## End(Not run)
GitStats
Retrieves organizations set or pulled by GitStats
. Especially
helpful when user is scanning whole git platform and wants to have a
glimpse at organizations.
show_orgs(gitstats)
show_orgs(gitstats)
gitstats |
A GitStats object. |
A vector of organizations.
Stop printing messages and output.
verbose_off(gitstats)
verbose_off(gitstats)
gitstats |
A GitStats object. |
A GitStats object.
Print all messages and output.
verbose_on(gitstats)
verbose_on(gitstats)
gitstats |
A GitStats object. |
A GitStats object.