Title: | This is a Collection of Functions to Analyse Gender Differences |
---|---|
Description: | Implementation of functions, which combines binomial calculation and data visualisation, to analyse the differences in publishing authorship by gender described in Day et al. (2020) <doi:10.1039/C9SC04090K>. It should only be used when self-reported gender is unavailable. |
Authors: | Rita Giordano [aut, cre], Aileen Day [aut], John Boyle [aut], Colin Batchelor [ctb], Royal Society of Chemistry [cph] |
Maintainer: | Rita Giordano <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-12-07 06:45:32 UTC |
Source: | CRAN |
This function use the data source based on combined US/UK censor data to assign gender based on first name.
assign_gender(data_df, first_name_col)
assign_gender(data_df, first_name_col)
data_df |
input dataframe containing the first name |
first_name_col |
first name column's name to assign gender to |
The input data frame with the gender column:
gender - assigned gender (F/M/U)
gender <- assign_gender(authors, "first_name")
gender <- assign_gender(authors, "first_name")
This data sets contains all the name fro UK and US social security
authors
authors
a data frame with 1000 rows of four variables:
first name
last lame
country
publication year
Function to create the balloon plot for gender first name
balloon_plot(data_df, gender_var, cutoff)
balloon_plot(data_df, gender_var, cutoff)
data_df |
data frame containing 'first name' and 'gender' columns from
|
gender_var |
gender possible values are F for female, M for male and U for unknown |
cutoff |
numerical value indicating where to cut the counting data |
The output is a gg object from ggplot2 which shows the most frequent names as a balloon plot.
gender <- assign_gender(authors, "first_name") bp <- balloon_plot(gender, "M", cutoff = 5)
gender <- assign_gender(authors, "first_name") bp <- balloon_plot(gender, "M", cutoff = 5)
Function to create a bar chart of the total number by gender
bar_chart(data_df, x_label, y_label)
bar_chart(data_df, x_label, y_label)
data_df |
dataframe from |
x_label |
label for x axis. |
y_label |
label for y axis. |
A bar chart as ggplot2 object showing on the y axis the
total number per gender and on the x axis the level previously defined in
total_gender_df
.
baseline
calculate the female baseline giving a dataframe
containing the gender information.
baseline(data_df, gender_col)
baseline(data_df, gender_col)
data_df |
dataframe containing the gender column. |
gender_col |
the name of the column containing the gender information. |
The function returns a numeric vector containing the baseline values
## df is the dataframe in output from the function assign_gender df <- data.frame(first_name = c("anna", "john", "ernest", "colin", "aileen"), gender = c("F", "M", "M", "M", "F"), stringsAsFactors = FALSE) baseline <- baseline(df, gender_col = "gender")
## df is the dataframe in output from the function assign_gender df <- data.frame(first_name = c("anna", "john", "ernest", "colin", "aileen"), gender = c("F", "M", "M", "M", "F"), stringsAsFactors = FALSE) baseline <- baseline(df, gender_col = "gender")
Create a bullet chart with significance bars to compare different baselines in percentage for gender analysis
bullet_chart(data_df, baseline_female, x_label, y_label, baseline_label)
bullet_chart(data_df, baseline_female, x_label, y_label, baseline_label)
data_df |
dataframe in output from |
baseline_female |
numeric vector containing the baseline for each level |
x_label |
label for x axis |
y_label |
label for y axis |
baseline_label |
label used to define the baseline name. |
This function create a bullet chart containing the percentage of
submission with the corresponding baseline for the level defined in
percent_df
.
Function to create a bullet chart with a line chart in the same graphical frame; to compare different baselines for gender analysis.
bullet_line_chart( data_df, baseline_female, x_label, y_bullet_chart_label, baseline_label, line_chart_df, line_chart_scaling, y_line_chart_label, line_label )
bullet_line_chart( data_df, baseline_female, x_label, y_bullet_chart_label, baseline_label, line_chart_df, line_chart_scaling, y_line_chart_label, line_label )
data_df |
dataframe in output from |
baseline_female |
numeric vector containing the baseline for each level |
x_label |
label for x axis for both charts |
y_bullet_chart_label |
label for y axis of the bullet chart |
baseline_label |
label used to define the baseline name. |
line_chart_df |
data frame containing the total number of submissions |
line_chart_scaling |
factor of conversion for second y-axis |
y_line_chart_label |
label the y-axis of the line chart |
line_label |
label used to define the line chart. |
The function create a bullet chart containing the percentage of male
and female with the corresponding baseline for the level defined in
percent_df
. The total number of submissions are displayed on
the top of the bullet chart.
Function to calculate the lower CI, upper CI, percentages and counts, and significance of difference from one or multiple baseline percentages, given supplied confidence level using
calculate_binom_baseline(data_df, baseline_female, confidence_level = 0.95)
calculate_binom_baseline(data_df, baseline_female, confidence_level = 0.95)
data_df |
dataframe in output from |
baseline_female |
female baseline in percentage from |
confidence_level |
confidence level to use for significance calculation, default is 0.95 |
This function returns a dataframe with additional columns than the input one:
lower_CI = lower confidence level of confidence interval expressed as a percentage
upper_CI = upper confidence level of confidence interval expressed as a percentage
lower_CI_count = lower confidence level of confidence interval expressed as a count
upper_CI_count = upper confidence level of confidence interval expressed as a count
significance = flag indicating whether difference of female percentage with baseline percentage is significant for the row in consideration. It has values "significant" or "" if not.
This data sets contains all the name fro UK and US social security
gender_names
gender_names
a data frame of two variables:
First name
Gender of the first name
Create a dataframe that will be the input to generate stacked bar chart and bullet chart that show percentage to compare proportions among gender.
percent_df(data_df)
percent_df(data_df)
data_df |
dataframe containing level, lower_CI, upper_CI,
significance and female and male percentages from
|
The output dataframe contains the columns x_values, y_values, gender, labels
reshape dataframe from long format to wide format.
reshape_for_binomials(data_df, gender_col, level)
reshape_for_binomials(data_df, gender_col, level)
data_df |
dataframe containing the columns gender and counts |
gender_col |
the name of the column containing the gender values. |
level |
variable to compare for the baseline. |
The output is a dataframe containing more columns than the input one, such as:
level : the variable used to perform the binomials total_for_level: the total amount of each gender including unknowns total_female_male: the total amount of male and female female_percentage: the percentage of female in the total_female_male male_percentage: the percentage of male in the total_female_male
authors_df <- assign_gender(data_df = authors, first_name_col = "first_name") female_count <- dplyr::count(authors_df, gender) ## create a new data frame to be used for the binomial calculation. df_gender <- reshape_for_binomials(data = female_count, gender_col = "gender", level = 2020)
authors_df <- assign_gender(data_df = authors, first_name_col = "first_name") female_count <- dplyr::count(authors_df, gender) ## create a new data frame to be used for the binomial calculation. df_gender <- reshape_for_binomials(data = female_count, gender_col = "gender", level = 2020)
Create a stacked bar chart with significance bars to compare with the female baseline for gender analysis.
stacked_bar_chart(data_df, baseline_female, x_label, y_label, baseline_label)
stacked_bar_chart(data_df, baseline_female, x_label, y_label, baseline_label)
data_df |
is the output dataframe from |
baseline_female |
female baseline in percentage from |
x_label |
label for x axis |
y_label |
label for y axis |
baseline_label |
label used to define the baseline name. |
This function create a bar chart containing the percentage of submission with the corresponding baseline.
This function create a gender diversity theme for chart based on ggplot2
theme_gd()
theme_gd()
an object of the class theme defined in ggplot2 own class system.
require(ggplot2) ggplot(authors, aes(x = publication_years)) + geom_bar() + theme_gd()
require(ggplot2) ggplot(authors, aes(x = publication_years)) + geom_bar() + theme_gd()
Create a dataframe that will be the input to generate the bar chart of the full amount of female and male
total_gender_df(data_df, level)
total_gender_df(data_df, level)
data_df |
dataframe from |
level |
name of level |
The output is a dataframe with the columns x_values,
total_female_male, gender, y_values. This data frame is the input to create
the bar chart for bar_chart