Package 'GenderInfer' reference manual

Title:	This is a Collection of Functions to Analyse Gender Differences
Description:	Implementation of functions, which combines binomial calculation and data visualisation, to analyse the differences in publishing authorship by gender described in Day et al. (2020) <doi:10.1039/C9SC04090K>. It should only be used when self-reported gender is unavailable.
Authors:	Rita Giordano [aut, cre], Aileen Day [aut], John Boyle [aut], Colin Batchelor [ctb], Royal Society of Chemistry [cph]
Maintainer:	Rita Giordano <giordanor@rsc.org>
License:	MIT + file LICENSE
Version:	0.1.0
Built:	2025-03-07 06:56:03 UTC
Source:	CRAN

Assign gender by first name

Description

This function use the data source based on combined US/UK censor data to assign gender based on first name.

Usage

assign_gender(data_df, first_name_col)
assign_gender(data_df, first_name_col)

Arguments

`data_df`	input dataframe containing the first name
`first_name_col`	first name column's name to assign gender to

Value

The input data frame with the gender column:

gender - assigned gender (F/M/U)

Examples

gender <- assign_gender(authors, "first_name")

gender <- assign_gender(authors, "first_name")

names dataset

Description

This data sets contains all the name fro UK and US social security

Usage

authors
authors

Format

a data frame with 1000 rows of four variables:

first_name: first name
last_name: last lame
country_code: country
publication_years: publication year

Function to create the balloon plot for gender first name

Description

Function to create the balloon plot for gender first name

Usage

balloon_plot(data_df, gender_var, cutoff)
balloon_plot(data_df, gender_var, cutoff)

Arguments

`data_df`	data frame containing 'first name' and 'gender' columns from `assign_gender`
`gender_var`	gender possible values are F for female, M for male and U for unknown
`cutoff`	numerical value indicating where to cut the counting data

Value

The output is a gg object from ggplot2 which shows the most frequent names as a balloon plot.

Examples

gender <- assign_gender(authors, "first_name")
bp <- balloon_plot(gender, "M", cutoff = 5)
gender <- assign_gender(authors, "first_name")
bp <- balloon_plot(gender, "M", cutoff = 5)

Function to create a bar chart of the total number by gender

Description

Function to create a bar chart of the total number by gender

Usage

bar_chart(data_df, x_label, y_label)
bar_chart(data_df, x_label, y_label)

Arguments

`data_df`	dataframe from `total_gender_df`
`x_label`	label for x axis.
`y_label`	label for y axis.

Value

A bar chart as ggplot2 object showing on the y axis the total number per gender and on the x axis the level previously defined in total_gender_df.

Calculate the female baseline

Description

baseline calculate the female baseline giving a dataframe containing the gender information.

Usage

baseline(data_df, gender_col)
baseline(data_df, gender_col)

Arguments

`data_df`	dataframe containing the gender column.
`gender_col`	the name of the column containing the gender information.

Value

The function returns a numeric vector containing the baseline values

Examples

## df is the dataframe in output from the function assign_gender
df <- data.frame(first_name = c("anna", "john", "ernest", "colin", "aileen"),
                 gender = c("F", "M",  "M", "M", "F"),
                 stringsAsFactors = FALSE)
baseline <- baseline(df, gender_col = "gender")
## df is the dataframe in output from the function assign_gender
df <- data.frame(first_name = c("anna", "john", "ernest", "colin", "aileen"),
                 gender = c("F", "M",  "M", "M", "F"),
                 stringsAsFactors = FALSE)
baseline <- baseline(df, gender_col = "gender")

Create a bullet chart with significance bars to compare different baselines in percentage for gender analysis

Description

Create a bullet chart with significance bars to compare different baselines in percentage for gender analysis

Usage

bullet_chart(data_df, baseline_female, x_label, y_label, baseline_label)
bullet_chart(data_df, baseline_female, x_label, y_label, baseline_label)

Arguments

`data_df`	dataframe in output from `percent_df`
`baseline_female`	numeric vector containing the baseline for each level
`x_label`	label for x axis
`y_label`	label for y axis
`baseline_label`	label used to define the baseline name.

Value

This function create a bullet chart containing the percentage of submission with the corresponding baseline for the level defined in percent_df.

Function to create a bullet chart with a line chart in the same graphical frame; to compare different baselines for gender analysis.

Description

Function to create a bullet chart with a line chart in the same graphical frame; to compare different baselines for gender analysis.

Usage

bullet_line_chart(
  data_df,
  baseline_female,
  x_label,
  y_bullet_chart_label,
  baseline_label,
  line_chart_df,
  line_chart_scaling,
  y_line_chart_label,
  line_label
)
bullet_line_chart(
  data_df,
  baseline_female,
  x_label,
  y_bullet_chart_label,
  baseline_label,
  line_chart_df,
  line_chart_scaling,
  y_line_chart_label,
  line_label
)

Arguments

`data_df`	dataframe in output from `percent_df`
`baseline_female`	numeric vector containing the baseline for each level
`x_label`	label for x axis for both charts
`y_bullet_chart_label`	label for y axis of the bullet chart
`baseline_label`	label used to define the baseline name.
`line_chart_df`	data frame containing the total number of submissions
`line_chart_scaling`	factor of conversion for second y-axis
`y_line_chart_label`	label the y-axis of the line chart
`line_label`	label used to define the line chart.

Value

The function create a bullet chart containing the percentage of male and female with the corresponding baseline for the level defined in percent_df. The total number of submissions are displayed on the top of the bullet chart.

Calculate binomials and significance for multiple baselines.

Description

Function to calculate the lower CI, upper CI, percentages and counts, and significance of difference from one or multiple baseline percentages, given supplied confidence level using

Usage

calculate_binom_baseline(data_df, baseline_female, confidence_level = 0.95)
calculate_binom_baseline(data_df, baseline_female, confidence_level = 0.95)

Arguments

`data_df`	dataframe in output from `reshape_for_binomials` containing the columns: female, male, which contain the integer counts of males and females respectively and must be a numeric vector greater than 0.
`baseline_female`	female baseline in percentage from `baseline`.
`confidence_level`	confidence level to use for significance calculation, default is 0.95

Value

This function returns a dataframe with additional columns than the input one:

lower_CI = lower confidence level of confidence interval expressed as a percentage

upper_CI = upper confidence level of confidence interval expressed as a percentage

lower_CI_count = lower confidence level of confidence interval expressed as a count

upper_CI_count = upper confidence level of confidence interval expressed as a count

significance = flag indicating whether difference of female percentage with baseline percentage is significant for the row in consideration. It has values "significant" or "" if not.

Gender names dataset

Description

This data sets contains all the name fro UK and US social security

Usage

gender_names
gender_names

Format

a data frame of two variables:

Name: First name
UKUS_Gender: Gender of the first name

Create a dataframe that will be the input to generate stacked bar chart and bullet chart that show percentage to compare proportions among gender.

Description

Create a dataframe that will be the input to generate stacked bar chart and bullet chart that show percentage to compare proportions among gender.

Usage

percent_df(data_df)
percent_df(data_df)

Arguments

data_df

dataframe containing level, lower_CI, upper_CI, significance and female and male percentages from calculate_binom_baseline

Value

The output dataframe contains the columns x_values, y_values, gender, labels

Reshape the dataframe to make it easier to carry out binomial calculations.

Description

reshape dataframe from long format to wide format.

Usage

reshape_for_binomials(data_df, gender_col, level)
reshape_for_binomials(data_df, gender_col, level)

Arguments

`data_df`	dataframe containing the columns gender and counts
`gender_col`	the name of the column containing the gender values.
`level`	variable to compare for the baseline.

Value

The output is a dataframe containing more columns than the input one, such as:

level : the variable used to perform the binomials total_for_level: the total amount of each gender including unknowns total_female_male: the total amount of male and female female_percentage: the percentage of female in the total_female_male male_percentage: the percentage of male in the total_female_male

Examples

authors_df <- assign_gender(data_df = authors, first_name_col = "first_name")
female_count <- dplyr::count(authors_df, gender)

## create a new data frame to be used for the binomial calculation.
df_gender <- reshape_for_binomials(data = female_count, gender_col = "gender",
                                  level = 2020)
authors_df <- assign_gender(data_df = authors, first_name_col = "first_name")
female_count <- dplyr::count(authors_df, gender)

## create a new data frame to be used for the binomial calculation.
df_gender <- reshape_for_binomials(data = female_count, gender_col = "gender",
                                  level = 2020)

Create a stacked bar chart with significance bars to compare with the female baseline for gender analysis.

Description

Create a stacked bar chart with significance bars to compare with the female baseline for gender analysis.

Usage

stacked_bar_chart(data_df, baseline_female, x_label, y_label, baseline_label)
stacked_bar_chart(data_df, baseline_female, x_label, y_label, baseline_label)

Arguments

`data_df`	is the output dataframe from `percent_df`
`baseline_female`	female baseline in percentage from `baseline`
`x_label`	label for x axis
`y_label`	label for y axis
`baseline_label`	label used to define the baseline name.

Value

This function create a bar chart containing the percentage of submission with the corresponding baseline.

This function create a gender diversity theme for chart based on ggplot2

Description

This function create a gender diversity theme for chart based on ggplot2

Usage

theme_gd()
theme_gd()

Value

an object of the class theme defined in ggplot2 own class system.

Examples

 require(ggplot2)
 ggplot(authors, aes(x = publication_years)) + geom_bar() + theme_gd()
require(ggplot2)
 ggplot(authors, aes(x = publication_years)) + geom_bar() + theme_gd()

Create a dataframe that will be the input to generate the bar chart of the full amount of female and male

Description

Create a dataframe that will be the input to generate the bar chart of the full amount of female and male

Usage

total_gender_df(data_df, level)
total_gender_df(data_df, level)

Arguments

`data_df`	dataframe from `calculate_binom_baseline` containing Level, LCI, UCI, Significance and Male and Female percentages
`level`	name of level

Value

The output is a dataframe with the columns x_values, total_female_male, gender, y_values. This data frame is the input to create the bar chart for bar_chart

Package 'GenderInfer'

Help Index

Assign gender by first name

Description

Usage

Arguments

Value

Examples

names dataset

Description

Usage

Format

Function to create the balloon plot for gender first name

Description

Usage

Arguments

Value

Examples

Function to create a bar chart of the total number by gender

Description

Usage

Arguments

Value

Calculate the female baseline

Description

Usage

Arguments

Value

Examples

Create a bullet chart with significance bars to compare different baselines in percentage for gender analysis

Description

Usage

Arguments

Value

Function to create a bullet chart with a line chart in the same graphical frame; to compare different baselines for gender analysis.

Description

Usage

Arguments

Value

Calculate binomials and significance for multiple baselines.

Description

Usage

Arguments

Value

Gender names dataset

Description

Usage

Format

Create a dataframe that will be the input to generate stacked bar chart and bullet chart that show percentage to compare proportions among gender.

Description

Usage

Arguments

Value

Reshape the dataframe to make it easier to carry out binomial calculations.

Description

Usage

Arguments

Value

Examples

Create a stacked bar chart with significance bars to compare with the female baseline for gender analysis.

Description

Usage

Arguments

Value

This function create a gender diversity theme for chart based on ggplot2

Description

Usage

Value

Examples

Create a dataframe that will be the input to generate the bar chart of the full amount of female and male

Description

Usage

Arguments

Value