| Title: | Concise and Efficient Tools for Everyday Statistical Production |
|---|---|
| Description: | A set of concise and efficient tools for statistical production. Can also be used for data management. In statistical production, you deal with complex data and need to control your process at each step of your work. Concise functions are very helpful, because you do not hesitate to use them. The following functions are included in the package. 'dup' checks duplicates. 'miss' checks missing values. 'tac' computes contingency table of all columns. 'toc' compares two tables, spotting significant deviations. 'chi2_find' compares columns within a data.frame, spotting related categories of (a more complex function). |
| Authors: | Vincent Reduron [cre, aut] |
| Maintainer: | Vincent Reduron <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.2 |
| Built: | 2026-05-19 14:02:30 UTC |
| Source: | https://github.com/cran/ProduceR |
short for 'paste0()'
a %+% ba %+% b
a |
string |
b |
string |
string
Description.
base_eu_2024base_eu_2024
A data frame with 232 rows and 11 variables:
character
character
integer
integer
numeric
numeric
numeric
numeric
numeric
numeric
numeric
Source
Description.
base_eu_2025base_eu_2025
A data frame with 295 rows and 11 variables:
character
character
integer
integer
numeric
numeric
numeric
numeric
numeric
numeric
numeric
Source
Find modalities related to a criterion
chi2_find(df, criterion)chi2_find(df, criterion)
df |
data.frame |
criterion |
character string: criterion that spots target rows |
data.frame
Create vector of df's column types. Similar to colnames(), but with column types instead of names.
coltypes(df)coltypes(df)
df |
data.frame |
vector
Creates multiple result tables. The term "n-plicate" is used to generalize the notion of duplicate: a n_plicate can be a duplicate, a triplicate, etc.
dup( tab, keyby = NULL, count_what = "rows", partition = NULL, view = TRUE, nb_xmpl = 51 )dup( tab, keyby = NULL, count_what = "rows", partition = NULL, view = TRUE, nb_xmpl = 51 )
tab |
Either an R dataframe or a reference to a remote table ("remote table") |
keyby |
(character vector) names of the column(s) considered as keys |
count_what |
(character vector) defines what to count by key (by *keyby*). 'rows' to count distinct rows, otherwise the name of the columns whose distinct values are to be counted |
partition |
(character vector) names of the columns by which to break down the analysis |
view |
automatic opening of generated tables |
nb_xmpl |
number of duplicate examples displayed in table |
A set of dataframes in the global environment. * nup_r_tab: table of n-plicate counts * nup_xmpl_dupl: table of examples of n-plicates * nup_xmpl_nakey: table of examples of NA keys (n-plicates with value 0) * nup_r_tab_part: table of n-plicate counts broken down by the modalities of the 'partition' columns
# Check if "name" is a unique key of the starwars table (yes !) dup(dplyr::starwars, keyby = "name", view = FALSE) # Check if "key" is a unique key of the basic table (no !) basic <- data.frame("key" = c("a", "b", "c", "d", NA, "a", "e", "f"), "value" = c(112, 117, 317, NA, 0, 17, 117, 112)) dup(basic, keyby = "key", view = FALSE)# Check if "name" is a unique key of the starwars table (yes !) dup(dplyr::starwars, keyby = "name", view = FALSE) # Check if "key" is a unique key of the basic table (no !) basic <- data.frame("key" = c("a", "b", "c", "d", NA, "a", "e", "f"), "value" = c(112, 117, 317, NA, 0, 17, 117, 112)) dup(basic, keyby = "key", view = FALSE)
get recursion depth of a list
get_recursion_depth(x, depth = 0)get_recursion_depth(x, depth = 0)
x |
: input list |
depth |
: depth of x in another list (1 if x in a list. 2 if x is in a list of lists. Etc.) |
integer
Contingency table for column 'col_name' in data.frame 'df'
get_tac_column(df, col_name, values, strates)get_tac_column(df, col_name, values, strates)
df |
Input data.frame |
col_name |
string : name of column to which generate the contingency table |
values |
Vector of columns that serve as measures (amounts, counts, etc.) |
strates |
Vector of column names by which to stratify the contingency tables |
Returns TRUE or FALSE depending on whether its argument is of Date type or not
is.Date(x)is.Date(x)
x |
object |
TRUE/FALSE
Returns TRUE or FALSE depending on whether its argument is of POSIXct type or not
is.POSIXct(x)is.POSIXct(x)
x |
object |
TRUE/FALSE
Returns TRUE or FALSE depending on whether its argument is of POSIXlt type or not
is.POSIXlt(x)is.POSIXlt(x)
x |
object |
TRUE/FALSE
Returns TRUE or FALSE depending on whether its argument is of POSIXxt type or not
is.POSIXt(x)is.POSIXt(x)
x |
object |
TRUE/FALSE
Get a synthetic table of missing values for all columns of a data.frame
miss(df, values = NULL, view = FALSE)miss(df, values = NULL, view = FALSE)
df |
data.frame: Input data.frame |
values |
column: Variable (~weight) to measure the number of missing values (otherwise, count of rows) |
view |
boolean: Display a glimpse of cases with NA values |
data.frame
miss(mtcars) # Checking NA values for all columns of mtcars (none)miss(mtcars) # Checking NA values for all columns of mtcars (none)
Contingency table (tac) of all columns in a dataframe for control purposes
tac( df, values = NULL, sample_rate = 0.01, num_but_discrete = "NULL", strates = NULL )tac( df, values = NULL, sample_rate = 0.01, num_but_discrete = "NULL", strates = NULL )
df |
Input data.frame |
values |
Vector of columns that serve as measures (amounts, counts, etc.) |
sample_rate |
Sampling rate, if df is a remote table |
num_but_discrete |
Vector of names of numeric columns with discrete modalities (not continuous) |
strates |
Vector of column names by which to stratify the contingency tables |
data.frame
tab <- tac(iris) # calculate column frequenciestab <- tac(iris) # calculate column frequencies
Generalized detection of outlier values in a database, based on contingency tables (tac)
toc( df1, df2, values = NULL, a = 10, r = 0.34, sample_rate = 0.01, num_but_discrete = "NULL" )toc( df1, df2, values = NULL, a = 10, r = 0.34, sample_rate = 0.01, num_but_discrete = "NULL" )
df1 |
Input data.frame (to compare with df2) |
df2 |
Input data.frame (to compare with df1) |
values |
Vector of columns that serve as measures (amounts, counts, etc.) |
a |
Allowed absolute variation |
r |
Allowed relative variation |
sample_rate |
Sampling rate, if df is a remote table |
num_but_discrete |
Numeric variables to be treated as discrete modal variables. If 'all', all numeric variables are treated as discrete modal variables. |
data.frame
Difference score between x and y (0 = no significant difference, >0 = presence of significant difference)
toc_score(x, y, a)toc_score(x, y, a)
x |
(num) First value to compare |
y |
(num) Second value to compare |
a |
(num) Absolute difference threshold below which all differences are considered normal |
numeric
toc_score(15, 1500, a = 500) # 1.91: significant difference toc_score(1432, 1501, a = 100) # 0: non-significant differencetoc_score(15, 1500, a = 500) # 1.91: significant difference toc_score(1432, 1501, a = 100) # 0: non-significant difference