Title: | Make and Apply Customized Rounding Specifications for Tables |
---|---|
Description: | Translate double and integer valued data into character values formatted for tabulation in manuscripts or other types of academic reports. |
Authors: | Byron Jaeger [aut, cre] |
Maintainer: | Byron Jaeger <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.5 |
Built: | 2024-11-25 16:42:58 UTC |
Source: | CRAN |
Convert table data to inline list
as_inline(data, tbl_variables, tbl_values)
as_inline(data, tbl_variables, tbl_values)
data |
a data frame. |
tbl_variables |
column names that will be used to form groups in the table |
tbl_values |
column names that contains table values. |
a list of tbl_values
values for each permutation of tbl_variables
variables in tbl_variables
that have missing values will be
have their missing values converted into an explicit category named
variable_missing, where 'variable' is the name of the variable.
example_data <- data.frame( sex = c("female", "male"), height = c("158 (154 - 161)", "178 (175 - 188)") ) as_inline(example_data, tbl_variables = 'sex', tbl_values = 'height') car_data <- mtcars car_data$car_name <- rownames(mtcars) as_inline(car_data, tbl_variables = 'car_name', tbl_values = 'mpg')
example_data <- data.frame( sex = c("female", "male"), height = c("158 (154 - 161)", "178 (175 - 188)") ) as_inline(example_data, tbl_variables = 'sex', tbl_values = 'height') car_data <- mtcars car_data$car_name <- rownames(mtcars) as_inline(car_data, tbl_variables = 'car_name', tbl_values = 'mpg')
If you have table values that take the form point estimate (uncertainty estimate), you can use these functions to access specific parts of the table value.
bracket_drop(x, bracket_left = "(", bracket_right = ")") bracket_extract( x, bracket_left = "(", bracket_right = ")", drop_bracket = FALSE ) bracket_insert_left(x, string, bracket_left = "(", bracket_right = ")") bracket_insert_right(x, string, bracket_left = "(", bracket_right = ")") bracket_point_estimate(x, bracket_left = "(", bracket_right = ")") bracket_lower_bound( x, bracket_left = "(", separator = ",", bracket_right = ")" ) bracket_upper_bound( x, bracket_left = "(", separator = ",", bracket_right = ")" )
bracket_drop(x, bracket_left = "(", bracket_right = ")") bracket_extract( x, bracket_left = "(", bracket_right = ")", drop_bracket = FALSE ) bracket_insert_left(x, string, bracket_left = "(", bracket_right = ")") bracket_insert_right(x, string, bracket_left = "(", bracket_right = ")") bracket_point_estimate(x, bracket_left = "(", bracket_right = ")") bracket_lower_bound( x, bracket_left = "(", separator = ",", bracket_right = ")" ) bracket_upper_bound( x, bracket_left = "(", separator = ",", bracket_right = ")" )
x |
a character vector where each value contains a point estimate and confidence limits. |
bracket_left |
a character value specifying what symbol is used to bracket the left hand side of the confidence interval |
bracket_right |
a character value specifying what symbol is used to bracket the right hand side of the confidence interval |
drop_bracket |
a logical value ( |
string |
a character value of a string that will be inserted into the left or right side of the bracket. |
separator |
a character value specifying what symbol is used to separate the lower and upper bounds of the interval. |
a character value with length equal to the length of x
.
tbl_value <- "12.1 (95% CI: 9.1, 15.1)" bracket_drop(tbl_value) bracket_point_estimate(tbl_value) bracket_extract(tbl_value, drop_bracket = TRUE) bracket_lower_bound(tbl_value) bracket_upper_bound(tbl_value)
tbl_value <- "12.1 (95% CI: 9.1, 15.1)" bracket_drop(tbl_value) bracket_point_estimate(tbl_value) bracket_extract(tbl_value, drop_bracket = TRUE) bracket_lower_bound(tbl_value) bracket_upper_bound(tbl_value)
Values to the left of the decimal are generally called 'big' since they
are larger than values to the right of the decimal. format_big()
lets you update the settings of a rounding_specification
object
(see round_spec) so that values left of the decimal will be printed
with a specific format (see examples).
format_big(rspec, mark = ",", interval = 3L)
format_big(rspec, mark = ",", interval = 3L)
rspec |
a |
mark |
a character value used to separate number groups to the left of the decimal point. See prettyNum for more details on this. Set this input to ” to negate it's effect. |
interval |
a numeric value indicating the size of number groups for numbers left of the decimal. |
an object of class rounding_specification
.
big_x <- 1234567 rspec <- format_big(round_spec(), mark = '|', interval = 3) table_value(big_x, rspec) # returns "1|234|567"
big_x <- 1234567 rspec <- format_big(round_spec(), mark = '|', interval = 3) table_value(big_x, rspec) # returns "1|234|567"
format_decimal()
lets you update the settings of a
rounding_specification
object (see round_spec) so that
the decimal is represented by a user-specified mark.
format_decimal(rspec, mark = ".")
format_decimal(rspec, mark = ".")
rspec |
a |
mark |
a character value used to represent the decimal point. |
an object of class rounding_specification
.
Other formatting helpers:
format_small()
small_x <- 0.1234567 rspec <- round_spec() rspec <- round_using_decimal(rspec, digits = 7) rspec <- format_decimal(rspec, mark = '*') table_value(small_x, rspec)
small_x <- 0.1234567 rspec <- round_spec() rspec <- round_using_decimal(rspec, digits = 7) rspec <- format_decimal(rspec, mark = '*') table_value(small_x, rspec)
format_missing()
updates a rounding_specification
object so that
missing values are printed as the user specifies.
format_missing(rspec, replace_na_with)
format_missing(rspec, replace_na_with)
rspec |
a |
replace_na_with |
a character value that replaces missing values. |
an object of class rounding_specification
.
rspec <- round_spec() rspec <- format_missing(rspec, 'oh no!') table_value(x = c(pi, NA), rspec)
rspec <- round_spec() rspec <- format_missing(rspec, 'oh no!') table_value(x = c(pi, NA), rspec)
Values to the right of the decimal are generally called 'small' since they
are smaller than values to the left of the decimal. format_small()
lets you update the settings of a rounding_specification
object
(see round_spec) so that values right of the decimal will be printed
with a specific format (see examples).
format_small(rspec, mark = "", interval = 5L)
format_small(rspec, mark = "", interval = 5L)
rspec |
a |
mark |
a character value used to separate number groups to the right of the decimal point. See prettyNum for more details on this. Set this input to ” to negate it's effect. |
interval |
a numeric value indicating the size of number groups for numbers left of the decimal. |
an object of class rounding_specification
.
Other formatting helpers:
format_decimal()
small_x <- 0.1234567 rspec <- round_spec() rspec <- round_using_decimal(rspec, digits = 7) rspec <- format_small(rspec, mark = '*', interval = 1) table_value(small_x, rspec)
small_x <- 0.1234567 rspec <- round_spec() rspec <- round_using_decimal(rspec, digits = 7) rspec <- format_small(rspec, mark = '*', interval = 1) table_value(small_x, rspec)
The US National Health and Nutrition Examination Survey (NHANES) was designed to assess the health and nutritional status of the non-institutionalized US population and was conducted by the National Center for Health Statistics of the Centers for Disease Control and Prevention (1). Since 1999-2000, NHANES has been conducted in two-year cycles using a multistage probability sampling design to select participants. Each cycle is independent with different participants recruited.
nhanes
nhanes
A data frame with columns:
NHANES exam: 2013-2014, 2015-2016, or 2017-2018
survey participant identifier
primary sampling unit
survey strata
2 year mobile examination weights
exam status. Participants either completed both the NHANES interview and exam or just the interview.
participant's age, in years
participant's sex
participant's race and ethnicity
participant's education
pregancy status
participant's systolic blood pressure, mm Hg
participant's diastolic blood pressure, mm Hg
the number of valid systolic blood pressure readings
the number of valid diastolic blood pressure readings
was participant ever told they had high blood pressure by a medical professional?
is participant currently using medication to lower their blood pressure?
Blood pressure measurements
The same protocol was followed to measure systolic and diastolic blood pressure (SBP and DBP) in each NHANES cycle. After survey participants had rested 5 minutes, their BP was measured by a trained physician using a mercury sphygmomanometer and an appropriately sized cuff. Three BP measurements were obtained at 30 second intervals.
NHANES website, https://www.cdc.gov/nchs/nhanes/index.htm
National health and nutrition examination survey homepage, available at https://www.cdc.gov/nchs/nhanes/index.htm. Accessed on 09/07/2020.
nhanes
nhanes
Rounding a number x to the nearest integer requires some tie-breaking
rule for those cases when x is exactly half-way between two integers,
that is, when the fraction part of x is exactly 0.5. The
round_half_up()
function implements a tie-breaking rule that
consistently rounds half units upward. Although this creates a slight
bias toward larger rounded outputs, it is widely used in many disciplines.
The round_half_even()
function breaks ties by rounding to the nearest
even unit.
round_half_up(rspec) round_half_even(rspec)
round_half_up(rspec) round_half_even(rspec)
rspec |
a |
an object of class rounding_specification
.
Other rounding helpers:
round_using_magnitude()
# note base R behavior rounds to even: round(0.5) # --> 0 round(1.5) # --> 2 round(2.5) # --> 2 # make rspec that rounds up rspec <- round_half_up(round_spec()) rspec <- round_using_decimal(rspec, digits = 0) # check table_value(0.5, rspec) # --> 1 table_value(1.5, rspec) # --> 2 table_value(2.5, rspec) # --> 3 # make rspec that rounds even rspec <- round_half_even(round_spec()) rspec <- round_using_decimal(rspec, digits = 0) # check table_value(0.5, rspec) # --> 0 table_value(1.5, rspec) # --> 2 table_value(2.5, rspec) # --> 2
# note base R behavior rounds to even: round(0.5) # --> 0 round(1.5) # --> 2 round(2.5) # --> 2 # make rspec that rounds up rspec <- round_half_up(round_spec()) rspec <- round_using_decimal(rspec, digits = 0) # check table_value(0.5, rspec) # --> 1 table_value(1.5, rspec) # --> 2 table_value(2.5, rspec) # --> 3 # make rspec that rounds even rspec <- round_half_even(round_spec()) rspec <- round_using_decimal(rspec, digits = 0) # check table_value(0.5, rspec) # --> 0 table_value(1.5, rspec) # --> 2 table_value(2.5, rspec) # --> 2
round_spec()
creates a rounding specification object with default
settings. The settings of a rounding specification object can be
updated using functions in the round_
(see round_half_up,
round_half_even, round_using_signif, round_using_decimal,
and round_using_magnitude) and format_
(see format_missing,
format_big, format_small, and format_decimal) families.
round_spec(force_default = FALSE)
round_spec(force_default = FALSE)
force_default |
a logical value. If |
Rounding specifications are meant to be passed into the table_glue
and table_value functions. The specification can also be passed into
table_
functions implicitly by saving a rounding specification into
the global options.
The round_spec()
function intentionally uses no input arguments.
This is to encourage users to develop rounding specifications
using the round_
and format_
families in conjunction with
the pipe (%>%
) operator.
an object of class rounding_specification
.
rspec <- round_spec() table_value(x = pi, rspec)
rspec <- round_spec() table_value(x = pi, rspec)
These functions update a rounding_specification
object (see
round_spec) so that a particular approach to rounding is applied:
round to a dynamic decimal place based on magnitude
of the rounded number (round_using_magnitude()
)
round to a specific number of significant
digits (round_using_signif()
)
round to a specific decimal place (round_using_decimal()
)
round_using_magnitude(rspec, digits = c(2, 1, 0), breaks = c(1, 10, Inf)) round_using_signif(rspec, digits = 2) round_using_decimal(rspec, digits = 1)
round_using_magnitude(rspec, digits = c(2, 1, 0), breaks = c(1, 10, Inf)) round_using_signif(rspec, digits = 2) round_using_decimal(rspec, digits = 1)
rspec |
a |
digits |
for |
breaks |
(only relevant if rounding based on magnitude) a positive, monotonically increasing numeric vector designating rounding boundaries. |
digits
and breaks
must be used in coordination with each other
when rounding based on magnitude. For example, using
breaks = c(1, 10, Inf)
and decimals = c(2, 1, 0)
,
numbers whose absolute value is < 1 are rounded to 2 decimal places,
numbers whose absolute value is >= 1 and < 10 are rounding to 1 decimal place, and
numbers whose absolute value is >= 10 are rounding to 0 decimal places. The use of magnitude to guide rounding rules is extremely flexible and can be used for many different applications (e.g., see table_pvalue). Rounding by magnitude is similar in some ways to rounding to a set number of significant digits but not entirely the same (see examples).
an object of class rounding_specification
.
Other rounding helpers:
round_half_up()
x <- c(pi, exp(1)) x <- c(x, x*10, x*100, x*1000) # make one specification using each rounding approach specs <- list( magnitude = round_using_magnitude(round_spec()), decimal = round_using_decimal(round_spec()), signif = round_using_signif(round_spec()) ) # apply all three rounding specifications to x # notice how the rounding specifications are in agreement # for smaller values of x but their answers are different # for larger values of x. sapply(specs, function(rspec) table_value(x, rspec)) # output: # magnitude decimal signif # [1,] "3.1" "3.1" "3.1" # [2,] "2.7" "2.7" "2.7" # [3,] "31" "31.4" "31.0" # [4,] "27" "27.2" "27.0" # [5,] "314" "314.2" "310.0" # [6,] "272" "271.8" "270.0" # [7,] "3,142" "3,141.6" "3,100.0" # [8,] "2,718" "2,718.3" "2,700.0"
x <- c(pi, exp(1)) x <- c(x, x*10, x*100, x*1000) # make one specification using each rounding approach specs <- list( magnitude = round_using_magnitude(round_spec()), decimal = round_using_decimal(round_spec()), signif = round_using_signif(round_spec()) ) # apply all three rounding specifications to x # notice how the rounding specifications are in agreement # for smaller values of x but their answers are different # for larger values of x. sapply(specs, function(rspec) table_value(x, rspec)) # output: # magnitude decimal signif # [1,] "3.1" "3.1" "3.1" # [2,] "2.7" "2.7" "2.7" # [3,] "31" "31.4" "31.0" # [4,] "27" "27.2" "27.0" # [5,] "314" "314.2" "310.0" # [6,] "272" "271.8" "270.0" # [7,] "3,142" "3,141.6" "3,100.0" # [8,] "2,718" "2,718.3" "2,700.0"
Though they are not easy to find in print, there are some general conventions for rounding numbers. When rounding a summary statistic such as the mean or median, the number of rounded digits shown should be governed by the precision of the statistic. For instance, authors are usually asked to present means plus or minus standard deviations in published research, or regression coefficients plus or minus the standard error. The convention applied here is to
find the place of the first significant digit of the error
round the estimate to that place
round the error to 1 additional place
present the combination in a form such as estimate (error) or estimate +/- error
table_ester( estimate, error, form = "{estimate} ± {error}", majority_rule = FALSE ) table_estin( estimate, lower, upper, form = "{estimate} ({lower}, {upper})", majority_rule = FALSE )
table_ester( estimate, error, form = "{estimate} ± {error}", majority_rule = FALSE ) table_estin( estimate, lower, upper, form = "{estimate} ({lower}, {upper})", majority_rule = FALSE )
estimate |
a numeric vector of estimate values. |
error |
a numeric vector of error values. All errors should be >0. |
form |
a character value that indicates how the error and estimate
should be formatted together. Users can specify anything they like
as long as they use the terms |
majority_rule |
a logical value. If |
lower |
the lower-bound of an interval for the estimate. |
upper |
the upper-bound of an interval for the estimate. |
a character vector
Blackstone, Eugene H. "Rounding numbers" (2016): The Journal of Thoracic and Cardiovascular Surgery. DOI: https://doi.org/10.1016/j.jtcvs.2016.09.003
Other table helpers:
table_glue()
,
table_pvalue()
,
table_value()
# ---- examples are taken from Blackstone, 2016 ---- # Example 1: ---- # Mean age is 72.17986, and the standard deviation (SD) is 9.364132. ## Steps: ## - Nine is the first significant figure of the SD. ## - Nine is in the ones place. Thus... ## + round the mean to the ones place (i.e., round(x, digits = 0)) ## + round the SD to the tenths place (i.e., round(x, digits = 1)) table_ester(estimate = 72.17986, error = 9.364132) # > [1] 72 +/- 9.4 # an estimated lower and upper bound for 95% confidence limits lower <- 72.17986 - 1.96 * 9.364132 upper <- 72.17986 + 1.96 * 9.364132 table_estin(estimate = 72.17986, lower = lower, upper = upper, form = "{estimate} (95% CI: {lower}, {upper})") # > [1] "72 (95% CI: 54, 91)" # Example 2: ---- # Mean cost is $72,347.23, and the standard deviation (SD) is $23,994.06. ## Steps: ## - Two is the first significant figure of the SD. ## - Nine is in the ten thousands place. Thus... ## + round mean to the 10-thousands place (i.e., round(x, digits = -4)) ## + round SD to the thousands place (i.e., round(x, digits = -3)) table_ester(estimate = 72347.23, error = 23994.06) # > [1] "70,000 +/- 24,000" # an estimated lower and upper bound for 95% confidence limits lower <- 72347.23 - 1.96 * 23994.06 upper <- 72347.23 + 1.96 * 23994.06 table_estin(estimate = 72347.23, lower = lower, upper = upper, form = "{estimate} (95% CI: {lower} - {upper})") # > [1] "70,000 (95% CI: 30,000 - 120,000)"
# ---- examples are taken from Blackstone, 2016 ---- # Example 1: ---- # Mean age is 72.17986, and the standard deviation (SD) is 9.364132. ## Steps: ## - Nine is the first significant figure of the SD. ## - Nine is in the ones place. Thus... ## + round the mean to the ones place (i.e., round(x, digits = 0)) ## + round the SD to the tenths place (i.e., round(x, digits = 1)) table_ester(estimate = 72.17986, error = 9.364132) # > [1] 72 +/- 9.4 # an estimated lower and upper bound for 95% confidence limits lower <- 72.17986 - 1.96 * 9.364132 upper <- 72.17986 + 1.96 * 9.364132 table_estin(estimate = 72.17986, lower = lower, upper = upper, form = "{estimate} (95% CI: {lower}, {upper})") # > [1] "72 (95% CI: 54, 91)" # Example 2: ---- # Mean cost is $72,347.23, and the standard deviation (SD) is $23,994.06. ## Steps: ## - Two is the first significant figure of the SD. ## - Nine is in the ten thousands place. Thus... ## + round mean to the 10-thousands place (i.e., round(x, digits = -4)) ## + round SD to the thousands place (i.e., round(x, digits = -3)) table_ester(estimate = 72347.23, error = 23994.06) # > [1] "70,000 +/- 24,000" # an estimated lower and upper bound for 95% confidence limits lower <- 72347.23 - 1.96 * 23994.06 upper <- 72347.23 + 1.96 * 23994.06 table_estin(estimate = 72347.23, lower = lower, upper = upper, form = "{estimate} (95% CI: {lower} - {upper})") # > [1] "70,000 (95% CI: 30,000 - 120,000)"
Expressive rounding for table values
table_glue(..., rspec = NULL, .sep = "", .envir = parent.frame())
table_glue(..., rspec = NULL, .sep = "", .envir = parent.frame())
... |
strings to round and format. Multiple inputs are concatenated together. Named arguments are not supported. |
rspec |
a |
.sep |
Separator used to separate elements |
.envir |
environment to evaluate each expression in. |
a character vector of length equal to the vectors supplied in ...
Other table helpers:
table_ester()
,
table_pvalue()
,
table_value()
x <- runif(10) y <- runif(10) table_glue("{x} / {y} = {x/y}") table_glue("{x}", "({100 * y}%)", .sep = ' ') df = data.frame(x = 1:10, y=1:10) table_glue("{x} / {y} = {as.integer(x/y)}", .envir = df) table_glue("{x} / {y} = {as.integer(x/y)}") with(df, table_glue("{x} / {y} = {as.integer(x/y)}")) mtcars$car <- rownames(mtcars) # use the default rounding specification table_glue( "the {car} gets ~{mpg} miles/gallon and weighs ~{wt} thousand lbs", .envir = mtcars[1:3, ] ) # use your own rounding specification rspec <- round_spec() rspec <- round_using_decimal(rspec, digits = 1) table_glue( "the {car} gets ~{mpg} miles/gallon and weighs ~{wt} thousand lbs", rspec = rspec, .envir = mtcars[1:3, ] )
x <- runif(10) y <- runif(10) table_glue("{x} / {y} = {x/y}") table_glue("{x}", "({100 * y}%)", .sep = ' ') df = data.frame(x = 1:10, y=1:10) table_glue("{x} / {y} = {as.integer(x/y)}", .envir = df) table_glue("{x} / {y} = {as.integer(x/y)}") with(df, table_glue("{x} / {y} = {as.integer(x/y)}")) mtcars$car <- rownames(mtcars) # use the default rounding specification table_glue( "the {car} gets ~{mpg} miles/gallon and weighs ~{wt} thousand lbs", .envir = mtcars[1:3, ] ) # use your own rounding specification rspec <- round_spec() rspec <- round_using_decimal(rspec, digits = 1) table_glue( "the {car} gets ~{mpg} miles/gallon and weighs ~{wt} thousand lbs", rspec = rspec, .envir = mtcars[1:3, ] )
When presenting p-values, journals tend to request a lot of finessing.
table_pvalue()
is meant to do almost all of the finessing for you.
The part it does not do is interpret the p-value. For that,
please see the guideline on interpretation of p-values by the American
Statistical Association (Wasserstein, 2016). The six main statements
on p-value usage are included in the "Interpreting p-values" section
below.
table_pvalue( x, round_half_to = "even", decimals_outer = 3L, decimals_inner = 2L, alpha = 0.05, bound_inner_low = 0.01, bound_inner_high = 0.99, bound_outer_low = 0.001, bound_outer_high = 0.999, miss_replace = "--", drop_leading_zero = TRUE )
table_pvalue( x, round_half_to = "even", decimals_outer = 3L, decimals_inner = 2L, alpha = 0.05, bound_inner_low = 0.01, bound_inner_high = 0.99, bound_outer_low = 0.001, bound_outer_high = 0.999, miss_replace = "--", drop_leading_zero = TRUE )
x |
a vector of numeric values. All values should be > 0 and < 1. |
round_half_to |
a character value indicating how to break ties when the rounded unit is exactly halfway between two rounding points. See round_half_even and round_half_up for details. Valid inputs are 'even' and 'up'. |
decimals_outer |
number of decimals to print when p > bound_outer_high or p < bound_outer_low. |
decimals_inner |
number of decimals to print when
|
alpha |
a numeric value indicating the significance level, i.e. the probability that you will make the mistake of rejecting the null hypothesis when it is true. |
bound_inner_low |
the lower bound of the inner range. |
bound_inner_high |
the upper bound of the inner range. |
bound_outer_low |
the lowest value printed. Values lower than the threshold will be printed as <threshold. |
bound_outer_high |
the highest value printed. Values higher than the threshold will be printed as >threshold. |
miss_replace |
a character value that replaces missing values. |
drop_leading_zero |
a logical value. If |
a character vector
The American Statistical Association (ASA) defines the p-value as follows:
A p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.
It then provides six principles to guide p-value usage:
P-values can indicate how incompatible the data are with a specified statistical model.A p-value provides one approach to summarizing the incompatibility between a particular set of data and a proposed model for the data. The most common context is a model, constructed under a set of assumptions, together with a so-called "null hypothesis". Often the null hypothesis postulates the absence of an effect, such as no difference between two groups, or the absence of a relationship between a factor and an outcome. The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis, if the underlying assumptions used to calculate the p-value hold. This incompatibility can be interpreted as casting doubt on or providing evidence against the null hypothesis or the underlying assumptions.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. Practices that reduce data analysis or scientific inference to mechanical "bright-line" rules (such as "p < 0.05") for justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision making. A conclusion does not immediately become "true" on one side of the divide and "false" on the other. Researchers should bring many contextual factors into play to derive scientific inferences, including the design of a study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity of assumptions that underlie the data analysis. Pragmatic considerations often require binary, "yes-no" decisions, but this does not mean that p-values alone can ensure that a decision is correct or incorrect. The widespread use of "statistical significance" (generally interpreted as "p<0.05") as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.
Proper inference requires full reporting and transparency P-values and related analyses should not be reported selectively. Conducting multiple analyses of the data and reporting only those with certain p-values (typically those passing a significance threshold) renders the reported p-values essentially uninterpretable. Cherry picking promising findings, also known by such terms as data dredging, significance chasing, significance questing, selective inference, and "p-hacking," leads to a spurious excess of statistically significant results in the published literature and should be vigorously avoided. One need not formally carry out multiple statistical tests for this problem to arise: Whenever a researcher chooses what to present based on statistical results, valid interpretation of those results is severely compromised if the reader is not informed of the choice and its basis. Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted, and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.
A p-value, or statistical significance, does not measure the size of an effect or the importance of a result. Statistical significance is not equivalent to scientific, human, or economic significance. Smaller p-values do not necessarily imply the presence of larger or more important effects, and larger p-values do not imply a lack of importance or even lack of effect. Any effect, no matter how tiny, can produce a small p-value if the sample size or measurement precision is high enough, and large effects may produce unimpressive p-values if the sample size is small or measurements are imprecise. Similarly, identical estimated effects will have different p-values if the precision of the estimates differs.
By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. Researchers should recognize that a p-value without context or other evidence provides limited information. For example, a p-value near 0.05 taken by itself offers only weak evidence against the null hypothesis. Likewise, a relatively large p-value does not imply evidence in favor of the null hypothesis; many other hypotheses may be equally or more consistent with the observed data. For these reasons, data analysis should not end with the calculation of a p-value when other approaches are appropriate and feasible.
Wasserstein, Ronald L., and Nicole A. Lazar. "The ASA statement on p-values: context, process, and purpose." (2016): The American Statistician: 129-133. DOI: https://doi.org/10.1080/00031305.2016.1154108
Other table helpers:
table_ester()
,
table_glue()
,
table_value()
# Guideline by the American Medical Association Manual of Style: # Round p-values to 2 or 3 digits after the decimal point depending # on the number of zeros. For example, ## - Change .157 to .16. ## - Change .037 to .04. ## - Don't change .047 to .05, because it will no longer be significant. ## - Keep .003 as is because 2 zeros after the decimal are fine. ## - Change .0003 or .00003 or .000003 to <.001 # # In addition, the guideline states that "expressing P to more than 3 # significant digits does not add useful information." You may or may not # agree with this guideline (I do not agree with parts of it), # but you will (hopefully) appreciate `table_pvalue()` automating these # recommendations if you submit papers to journals associated with # the American Medical Association. pvals_ama <- c(0.157, 0.037, 0.047, 0.003, 0.0003, 0.00003, 0.000003) table_pvalue(pvals_ama) # > [1] ".16" ".04" ".047" ".003" "<.001" "<.001" "<.001" # `table_pvalue()` will fight valiantly to keep your p-value < alpha if # it is < alpha. If it's >= alpha, `table_pvalue()` treats it normally. pvals_close <- c(0.04998, 0.05, 0.050002) table_pvalue(pvals_close) # > [1] ".04998" ".05" ".05"
# Guideline by the American Medical Association Manual of Style: # Round p-values to 2 or 3 digits after the decimal point depending # on the number of zeros. For example, ## - Change .157 to .16. ## - Change .037 to .04. ## - Don't change .047 to .05, because it will no longer be significant. ## - Keep .003 as is because 2 zeros after the decimal are fine. ## - Change .0003 or .00003 or .000003 to <.001 # # In addition, the guideline states that "expressing P to more than 3 # significant digits does not add useful information." You may or may not # agree with this guideline (I do not agree with parts of it), # but you will (hopefully) appreciate `table_pvalue()` automating these # recommendations if you submit papers to journals associated with # the American Medical Association. pvals_ama <- c(0.157, 0.037, 0.047, 0.003, 0.0003, 0.00003, 0.000003) table_pvalue(pvals_ama) # > [1] ".16" ".04" ".047" ".003" "<.001" "<.001" "<.001" # `table_pvalue()` will fight valiantly to keep your p-value < alpha if # it is < alpha. If it's >= alpha, `table_pvalue()` treats it normally. pvals_close <- c(0.04998, 0.05, 0.050002) table_pvalue(pvals_close) # > [1] ".04998" ".05" ".05"
table_value()
casts numeric vectors into character vectors.
The main purpose of table_value()
is to round and format
numeric data for presentation.
table_value(x, rspec = NULL)
table_value(x, rspec = NULL)
x |
a vector of numeric values. |
rspec |
a |
a vector of character values (rounded numbers).
Other table helpers:
table_ester()
,
table_glue()
,
table_pvalue()
table_value(0.123) table_value(1.23) table_value(12.3) with(mtcars, table_value(disp))
table_value(0.123) table_value(1.23) table_value(12.3) with(mtcars, table_value(disp))