| Title: | Bayesian "Now-Cast" Estimation of Event Probabilities in Multi-Party Democracies |
|---|---|
| Description: | An implementation of a Bayesian framework for the opinion poll based estimation of event probabilities in multi-party electoral systems (Bender and Bauer (2018) <doi:10.21105/joss.00606>). |
| Authors: | Andreas Bender [aut, cre] (ORCID: <https://orcid.org/0000-0001-5628-8611>), Alexander Bauer [aut] (ORCID: <https://orcid.org/0000-0003-3495-5131>), Rebekka Schade [ctb] |
| Maintainer: | Andreas Bender <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.6.27 |
| Built: | 2026-05-08 17:15:35 UTC |
| Source: | https://github.com/cran/coalitions |
Given a table with simulations in the rows and coalitions in the columns, this function returns the coalition probabilities for a specified coalition, by default excluding superior coalitions first
calculate_prob(majority_df, coalition, exclude_superior = TRUE, ...)calculate_prob(majority_df, coalition, exclude_superior = TRUE, ...)
majority_df |
A data frame containing logical values indicating if the coalitions (columns) have a majority (rows). |
coalition |
The coalition of interest for which superior coalitions
will be obtained by |
exclude_superior |
Logical. If |
... |
Further arguments passed to |
A data frame with one numeric column giving the coalition probability (percentage of simulations in which the coalition obtained a majority, after optionally excluding superior coalitions).
test_df <- data.frame( cdu = c(rep(FALSE, 9), TRUE), cdu_fdp = c(rep(FALSE, 8), TRUE, TRUE), cdu_fdp_greens = c(TRUE, TRUE, rep(FALSE, 6), TRUE, TRUE)) calculate_prob(test_df, "cdu_fdp_greens") # exclude_superior defaults to TRUE calculate_prob(test_df, "cdu_fdp_greens", exclude_superior=FALSE)test_df <- data.frame( cdu = c(rep(FALSE, 9), TRUE), cdu_fdp = c(rep(FALSE, 8), TRUE, TRUE), cdu_fdp_greens = c(TRUE, TRUE, rep(FALSE, 6), TRUE, TRUE)) calculate_prob(test_df, "cdu_fdp_greens") # exclude_superior defaults to TRUE calculate_prob(test_df, "cdu_fdp_greens", exclude_superior=FALSE)
Given a table with simulations in the rows and coalitions in the columns, this function returns the coalition probabilities for a specified coalition, by default excluding superior coalitions first
calculate_probs(majority_df, coalitions, exclude_superior = TRUE, ...)calculate_probs(majority_df, coalitions, exclude_superior = TRUE, ...)
majority_df |
A data frame containing logical values indicating if the coalitions (columns) have a majority (rows). |
coalitions |
A list of coalitions for which coalition probabilities should
be calculated. Each list entry must be a vector of party names. Those names
need to correspond to the names in |
exclude_superior |
Logical. If |
... |
Further arguments passed to |
A data frame with columns coalition (character) and
probability (numeric, 0–100), one row per coalition.
test_df <- data.frame( cdu = c(rep(FALSE, 9), TRUE), cdu_fdp = c(rep(FALSE, 8), TRUE, TRUE), cdu_fdp_greens = c(TRUE, TRUE, rep(FALSE, 6), TRUE, TRUE)) calculate_probs(test_df, list("cdu", "cdu_fdp", "cdu_fdp_greens")) calculate_probs(test_df, list("cdu", "cdu_fdp", "cdu_fdp_greens"), exclude_superior=FALSE)test_df <- data.frame( cdu = c(rep(FALSE, 9), TRUE), cdu_fdp = c(rep(FALSE, 8), TRUE, TRUE), cdu_fdp_greens = c(TRUE, TRUE, rep(FALSE, 6), TRUE, TRUE)) calculate_probs(test_df, list("cdu", "cdu_fdp", "cdu_fdp_greens")) calculate_probs(test_df, list("cdu", "cdu_fdp", "cdu_fdp_greens"), exclude_superior=FALSE)
Given a data frame containing multiple surveys (one row per survey), transforms the data into long format with one row per party.
collapse_parties( surveys, parties = c("cdu", "spd", "greens", "fdp", "left", "pirates", "fw", "afd", "bsw", "others") )collapse_parties( surveys, parties = c("cdu", "spd", "greens", "fdp", "left", "pirates", "fw", "afd", "bsw", "others") )
surveys |
A data frame with one survey per row. |
parties |
A character vector containing names of parties to collapse. |
Data frame in long format
emnid <- scrape_wahlrecht() emnid.long <- collapse_parties(emnid)emnid <- scrape_wahlrecht() emnid.long <- collapse_parties(emnid)
Calculates number of seats for the respective parties according to the method of d'Hondt.
dHondt(votes, parties, n_seats = 183)dHondt(votes, parties, n_seats = 183)
votes |
Number of votes per party. |
parties |
Names of parties (must be same length as votes). |
n_seats |
Number of seats in parliament. Defaults to 183 (seats in Austrian parliament). |
A named integer vector of seat counts, one entry per party,
in the same order as parties. The vector has a logical attribute
ties: TRUE if two or more parties had equal claim to the last
seat (i.e. the result is not uniquely determined and was resolved randomly),
FALSE otherwise. When ties = TRUE, re-running with a different
random seed may produce a different but equally valid seat distribution.
library(coalitions) library(dplyr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) %>% tidyr::unnest("survey") # calculate the seat distribution based on D'Hondt for a parliament with 300 seats dHondt(surveys$votes, surveys$party, n_seats = 300)library(coalitions) library(dplyr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) %>% tidyr::unnest("survey") # calculate the seat distribution based on D'Hondt for a parliament with 300 seats dHondt(surveys$votes, surveys$party, n_seats = 300)
Draw random numbers from posterior distribution
draw_from_posterior( survey, nsim = 10000, seed = as.numeric(now()), prior = NULL, correction = NULL )draw_from_posterior( survey, nsim = 10000, seed = as.numeric(now()), prior = NULL, correction = NULL )
survey |
survey object as returned by |
nsim |
number of simulations |
seed |
sets seed |
prior |
optional prior information. Defaults to 1/2 (Jeffrey's prior). |
correction |
A positive number. If not |
data.frame containing random draws from Dirichlet distribution
which can be interpreted as election results.
Given a table with simulations in the rows and coalitions in the columns, this function returns the coalition probabilities for a specified coalition, by default excluding superior coalitions first
get_probabilities( x, coalitions = list(c("cdu"), c("cdu", "fdp"), c("cdu", "fdp", "greens"), c("spd"), c("spd", "left"), c("spd", "left", "greens")), nsim = 1e+05, distrib.fun = sls, seats_majority = 300L, seed = as.numeric(now()), correction = NULL )get_probabilities( x, coalitions = list(c("cdu"), c("cdu", "fdp"), c("cdu", "fdp", "greens"), c("spd"), c("spd", "left"), c("spd", "left", "greens")), nsim = 1e+05, distrib.fun = sls, seats_majority = 300L, seed = as.numeric(now()), correction = NULL )
x |
A table containing one row per survey and survey information in
long format in a separate column named |
coalitions |
A list of coalitions for which coalition probabilities should
be calculated. Each list entry must be a vector of party names. Those names
need to correspond to the names in |
nsim |
number of simulations |
distrib.fun |
Function to calculate seat distribution. Defaults to
|
seats_majority |
The number of seats needed to obtain majority. |
seed |
sets seed |
correction |
A positive number. If not |
A tibble with the same rows as x (one per survey) and an
additional list-column probabilities containing a data frame of
coalition names and their probabilities (0–100) for each survey.
library(coalitions) library(dplyr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) # calculate probabilities for two coalitions probs <- get_probabilities(surveys, coalitions = list(c("cdu", "fdp"), c("spd", "left", "greens")), nsim = 100) # ensure fast runtime with only 100 simulations probs %>% tidyr::unnest("probabilities")library(coalitions) library(dplyr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) # calculate probabilities for two coalitions probs <- get_probabilities(surveys, coalitions = list(c("cdu", "fdp"), c("spd", "left", "greens")), nsim = 100) # ensure fast runtime with only 100 simulations probs %>% tidyr::unnest("probabilities")
Calculate seat distribution from draws from posterior
get_seats( dirichlet.draws, survey, distrib.fun = sls, samplesize = NULL, hurdle = 0.05, others = "others", ... )get_seats( dirichlet.draws, survey, distrib.fun = sls, samplesize = NULL, hurdle = 0.05, others = "others", ... )
dirichlet.draws |
Matrix containing random draws from posterior. |
survey |
The actual survey results on which |
distrib.fun |
Function to calculate seat distribution. Defaults to
|
samplesize |
Number of individuals participating in the |
hurdle |
The percentage threshold which has to be reached by a party to enter the parliament. Any party called "ssw" will be exempt from the hurdle. |
others |
A string indicating the name under which parties not listed explicitly are subsumed. |
... |
Further arguments passed to |
A data frame containing seat distributions for each simulation in
dirichlet.draws
draw_from_posterior, sls,
dHondt
library(coalitions) library(dplyr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) # simulate 100 seat distributions surveys <- surveys %>% mutate(draws = purrr::map(survey, draw_from_posterior, nsim = 100), seats = purrr::map2(draws, survey, get_seats)) surveys$seatslibrary(coalitions) library(dplyr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) # simulate 100 seat distributions surveys <- surveys %>% mutate(draws = purrr::map(survey, draw_from_posterior, nsim = 100), seats = purrr::map2(draws, survey, get_seats)) surveys$seats
Given a specific date, extract the survey from this date or the last one before this date.
get_surveys(country = "DE") get_surveys_by() get_surveys_rp() get_surveys_nds() get_surveys_saxony() get_surveys_brb() get_surveys_thuringen() get_latest(surveys = NULL, max_date = Sys.Date())get_surveys(country = "DE") get_surveys_by() get_surveys_rp() get_surveys_nds() get_surveys_saxony() get_surveys_brb() get_surveys_thuringen() get_latest(surveys = NULL, max_date = Sys.Date())
country |
Choose country from which surveys should be scraped.
Currently |
surveys |
If provided, latest survey will be obtained from this object,
otherwise calls |
max_date |
Specifies the date, relative to which latest survey will
be searched for. Defaults to |
Nested tibble. When fully unnested, the dataset contains the following columns:
Character name of the polling institute.
Publication date of the poll.
Start and end date of the field period, i.e. the dates during which the poll was conducted.
Number of respondents in the poll.
Character name of an individual party.
Percentage of respondents that chose the party. Given in
percentage points, i.e. 38% is given as 38.
Number of respondents that chose the party.
library(coalitions) get_surveys() library(coalitions) ### Scrape the newest poll for the German federal election # Possibility 1: Calling get_latest without arguments scrapes surveys from the web # Possibility 2: Use get_latest() on an already scraped dataset surveys <- get_latest(surveys_sample)library(coalitions) get_surveys() library(coalitions) ### Scrape the newest poll for the German federal election # Possibility 1: Calling get_latest without arguments scrapes surveys from the web # Possibility 2: Use get_latest() on an already scraped dataset surveys <- get_latest(surveys_sample)
Bar chart of the raw voter shares observed in one survey. Additionally to plotting positive voter shares, the function can be used to plot party-specific differences (e.g. between a survey and the election result), including negative numbers.
gg_survey(data, colors = NULL, labels = NULL, annotate_bars = TRUE, hurdle = 5)gg_survey(data, colors = NULL, labels = NULL, annotate_bars = TRUE, hurdle = 5)
data |
Scraped dataset containing one row per party in the column
|
colors |
Named vector containing party colors. If |
labels |
Named vector containing party labels. If |
annotate_bars |
If |
hurdle |
Hurdle for single parties to get into the parliament, e.g. '5'
for '5%'. If set to NULL no horizontal line is plotted.
The horizontal line can be suppressed using |
A ggplot object displaying voter shares as a bar chart.
library(tidyr) library(dplyr) library(coalitions) survey <- surveys_sample$surveys[[1]]$survey[[1]] gg_survey(survey)library(tidyr) library(dplyr) library(coalitions) survey <- surveys_sample$surveys[[1]]$survey[[1]] gg_survey(survey)
Calculates number of seats for the respective parties that have received more
than hurdle percent of votes (according to the method of Hare/Niemeyer)
hare_niemeyer(votes, parties, n_seats = 183)hare_niemeyer(votes, parties, n_seats = 183)
votes |
Number of votes per party. |
parties |
Names of parties (must be same length as votes). |
n_seats |
Number of seats in parliament. Defaults to 183 (seats in Austrian parliament). |
A data.frame containing parties above the hurdle and the respective
seats/percentages after redistribution via Hare/Niemeyer
library(coalitions) library(dplyr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) %>% tidyr::unnest("survey") # calculate the seat distribution based on Hare/Niemeyer for a parliament with 300 seats hare_niemeyer(surveys$votes, surveys$party, n_seats = 300)library(coalitions) library(dplyr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) %>% tidyr::unnest("survey") # calculate the seat distribution based on Hare/Niemeyer for a parliament with 300 seats hare_niemeyer(surveys$votes, surveys$party, n_seats = 300)
Do coalitions have a majority
have_majority( seats_tab, coalitions = list(c("cdu"), c("cdu", "fdp"), c("cdu", "fdp", "greens"), c("spd"), c("spd", "left"), c("spd", "left", "greens")), seats_majority = 300L, collapse = "_" )have_majority( seats_tab, coalitions = list(c("cdu"), c("cdu", "fdp"), c("cdu", "fdp", "greens"), c("spd"), c("spd", "left"), c("spd", "left", "greens")), seats_majority = 300L, collapse = "_" )
seats_tab |
A data frame containing number of seats obtained by a party.
Must have columns |
coalitions |
A list of coalitions for which coalition probabilities should
be calculated. Each list entry must be a vector of party names. Those names
need to correspond to the names in |
seats_majority |
The number of seats needed to obtain majority. |
collapse |
Character string passed to |
A data frame with one column per coalition. Each column is logical indicating whether the coalition obtained a majority in each simulation row.
library(coalitions) library(dplyr) library(purrr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) # check for majorities of two coalitions coals <- list(c("cdu", "fdp"), c("spd", "left", "greens")) # only use 100 simulations for a fast runtime surveys <- surveys %>% mutate(draws = map(survey, draw_from_posterior, nsim = 100), seats = map2(draws, survey, get_seats), majorities = map(seats, have_majority, coalitions = coals)) surveys$majoritieslibrary(coalitions) library(dplyr) library(purrr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) # check for majorities of two coalitions coals <- list(c("cdu", "fdp"), c("spd", "left", "greens")) # only use 100 simulations for a fast runtime surveys <- surveys %>% mutate(draws = map(survey, draw_from_posterior, nsim = 100), seats = map2(draws, survey, get_seats), majorities = map(seats, have_majority, coalitions = coals)) surveys$majorities
A vector of colors associated with German parties.
party_colors_departy_colors_de
A named character vector. Names indicate parties. Values contain color strings for the respective parties
A vector of labels associated with German parties.
party_labels_departy_labels_de
A named character vector. Names indicate parties. Values contain party names suitable for plot labels.
Per default, pools surveys starting from current date and going 14 days back. For each pollster within the defined time-frame, only the most recent survey is used.
pool_surveys( surveys, last_date = Sys.Date(), pollsters = c("allensbach", "emnid", "forsa", "fgw", "gms", "infratest", "dimap", "infratestdimap", "insa"), period = 14, period_extended = NA, corr = 0.5, weights = NULL )pool_surveys( surveys, last_date = Sys.Date(), pollsters = c("allensbach", "emnid", "forsa", "fgw", "gms", "infratest", "dimap", "infratestdimap", "insa"), period = 14, period_extended = NA, corr = 0.5, weights = NULL )
surveys |
A |
last_date |
Only surveys in the time-window from |
pollsters |
Character vector of pollsters that should be considered for pooling. |
period |
See |
period_extended |
Optional. If specified, all surveys in the time-window
from |
corr |
Assumed correlation between surveys (of different pollsters). Defaults to 0.5. |
weights |
Additional weights for individual surveys. |
A data frame with one row per party containing columns
pollster (set to "pooled"), date, start,
end, respondents (effective sample size), party,
percent, and votes.
library(coalitions) library(dplyr) latest <- get_latest(surveys_sample) pool_surveys(surveys_sample, last_date=as.Date("2017-09-02"))library(coalitions) library(dplyr) latest <- get_latest(surveys_sample) pool_surveys(surveys_sample, last_date=as.Date("2017-09-02"))
votes < hurdle
Calculate percentage of votes/seats after excluding parties with
votes < hurdle
redistribute(survey, hurdle = 0.05, others = "others", epsilon = 1e-05)redistribute(survey, hurdle = 0.05, others = "others", epsilon = 1e-05)
survey |
The actual survey results on which |
hurdle |
The percentage threshold which has to be reached by a party to enter the parliament. Any party called "ssw" will be exempt from the hurdle. |
others |
A string indicating the name under which parties not listed explicitly are subsumed. |
epsilon |
Percentages should add up to 1. If they do not, within accuracy
of |
A data frame with the same structure as survey but with
parties below the hurdle removed and vote percentages renormalized.
library(coalitions) library(dplyr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) # redistribute the shares of 'others' parties and parties with a share of under 5\% surveys <- surveys %>% mutate(survey_redist = purrr::map(survey, redistribute)) surveys$survey # results before redistribution surveys$survey_redist # results after redistributionlibrary(coalitions) library(dplyr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) # redistribute the shares of 'others' parties and parties with a share of under 5\% surveys <- surveys %>% mutate(survey_redist = purrr::map(survey, redistribute)) surveys$survey # results before redistribution surveys$survey_redist # results after redistribution
Scrapes survey tables and performs sanitation to output tidy data
scrape_wahlrecht( address = "https://www.wahlrecht.de/umfragen/emnid.htm", parties = c("CDU", "SPD", "GRUENE", "FDP", "LINKE", "PIRATEN", "AFD", "BSW", "SONSTIGE") ) scrape_by( address = "https://www.wahlrecht.de/umfragen/landtage/bayern.htm", parties = c("CSU", "SPD", "GRUENE", "FDP", "LINKE", "PIRATEN", "FW", "AFD", "SONSTIGE") ) scrape_rp( address = "https://www.wahlrecht.de/umfragen/landtage/rheinland-pfalz.htm", parties = c("CDU", "SPD", "GRUENE", "FDP", "LINKE", "AFD", "FW", "SONSTIGE"), ind_row_remove = -c(1:3) ) scrape_ltw( address = "https://www.wahlrecht.de/umfragen/landtage/niedersachsen.htm", parties = c("CDU", "SPD", "GRUENE", "FDP", "LINKE", "PIRATEN", "FW", "AFD", "BSW", "SONSTIGE"), ind_row_remove = -c(1:2) )scrape_wahlrecht( address = "https://www.wahlrecht.de/umfragen/emnid.htm", parties = c("CDU", "SPD", "GRUENE", "FDP", "LINKE", "PIRATEN", "AFD", "BSW", "SONSTIGE") ) scrape_by( address = "https://www.wahlrecht.de/umfragen/landtage/bayern.htm", parties = c("CSU", "SPD", "GRUENE", "FDP", "LINKE", "PIRATEN", "FW", "AFD", "SONSTIGE") ) scrape_rp( address = "https://www.wahlrecht.de/umfragen/landtage/rheinland-pfalz.htm", parties = c("CDU", "SPD", "GRUENE", "FDP", "LINKE", "AFD", "FW", "SONSTIGE"), ind_row_remove = -c(1:3) ) scrape_ltw( address = "https://www.wahlrecht.de/umfragen/landtage/niedersachsen.htm", parties = c("CDU", "SPD", "GRUENE", "FDP", "LINKE", "PIRATEN", "FW", "AFD", "BSW", "SONSTIGE"), ind_row_remove = -c(1:2) )
address |
http-address from which tables should be scraped. |
parties |
A character vector containing names of parties to collapse. |
ind_row_remove |
Negative vector of rows that will be skipped at the beginning. |
A tibble with one row per survey date and columns for date, respondents, and one column per party containing the percentage of votes.
library(coalitions) library(dplyr) scrape_wahlrecht() %>% slice(1:5) # Niedersachsen scrape_ltw() %>% slice(1:5) # Hessen scrape_ltw("https://www.wahlrecht.de/umfragen/landtage/hessen.htm", ind_row_remove=-c(1)) %>% slice(1:5)library(coalitions) library(dplyr) scrape_wahlrecht() %>% slice(1:5) # Niedersachsen scrape_ltw() %>% slice(1:5) # Hessen scrape_ltw("https://www.wahlrecht.de/umfragen/landtage/hessen.htm", ind_row_remove=-c(1)) %>% slice(1:5)
Calculates number of seats for the respective parties that have received more than 5% of votes (according to the method of Sainte-Lague/Schepers, see https://www.wahlrecht.de/verfahren/rangmasszahlen.html).
sls(votes, parties, n_seats = 598L)sls(votes, parties, n_seats = 598L)
votes |
A numeric vector giving the redistributes votes |
parties |
A character vector indicating the names of parties with
respective |
n_seats |
The total number of seats that can be assigned to the different parties. |
A named integer vector of seat counts, one entry per party,
in the same order as parties. The vector has a logical attribute
ties: TRUE if two or more parties had equal claim to the last
seat (i.e. the result is not uniquely determined and was resolved randomly),
FALSE otherwise. When ties = TRUE, re-running with a different
random seed may produce a different but equally valid seat distribution.
library(coalitions) library(dplyr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) %>% tidyr::unnest("survey") # calculate the seat distribution based on Sainte-Lague/Schepers for a parliament with 300 seats sls(surveys$votes, surveys$party, n_seats = 300)library(coalitions) library(dplyr) # get the latest survey for a sample of German federal election polls surveys <- get_latest(surveys_sample) %>% ungroup() %>% slice(1) %>% tidyr::unnest("survey") # calculate the seat distribution based on Sainte-Lague/Schepers for a parliament with 300 seats sls(surveys$votes, surveys$party, n_seats = 300)
A data set with surveys from seven different pollsters, three surveys per pollster. Surveys report support for different parties in the running for the German Bundestag prior to the 2017 election.
surveys_samplesurveys_sample
A nested data frame with 7 rows and 2 columns:
name of the pollster
a list of data frames, each containing one survey
Try call of read_html that throws an error if the url cannot be resolved
try_readHTML(url)try_readHTML(url)
url |
http-address that should be scraped. |
An xml_document object as returned by xml2::read_html.