Title: | Higher Criticism Test of Two Frequency Counts Tables |
---|---|
Description: | Higher Criticism (HC) test between two frequency tables. Test is based on an adaptation of the Tukey-Donoho-Jin HC statistic to testing frequency tables described in Kipnis (2019) <arXiv:1911.01208>. |
Authors: | Alon Kipnis <[email protected]> |
Maintainer: | Alon Kipnis <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.2 |
Built: | 2024-11-01 11:39:14 UTC |
Source: | CRAN |
Compute the HC stasitic and the HC threshold given a list of P-values.
Can be used with function two.sample.pvals
to
get a list of p-values discriminating each feature
between the two tables.
stbl
– normalize using expeted p-value
(stbl==True) or observed (stbl==False)
alpha
– lower fraction of p-values to use
HC.vals(pv, alpha = 0.45, stbl = TRUE)
HC.vals(pv, alpha = 0.45, stbl = TRUE)
pv |
A list of numbers betwee 0 and 1. |
alpha |
A number between 0 and 1. |
stbl |
A boolean. |
A list containing the following fields:
HC
– Higher Critcism (HC) score
HC.star
– HC score corrected to finite sample
p
– p-value attaining HC
p.star
– p-value attaining HC.star
tb1 = table(c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,5,6,6,7,7,7)) tb2 = table(c(1,1,1,1,1,1,1,1,1,2,3,3,3,3,3,4,4,4,5,5,5,6)) PV = two.sample.pvals(tb1, tb2) # compute P-values HC.vals(PV$pv) # combine P-values using the HC statistics # Can be used to check similarity of word-frequencies in texts: text1 = "On the day House Democrats opened an impeachment inquiry of President Trump last week, Pete Buttigieg was being grilled by Iowa voters on other subjects: how to loosen the grip of the rich on government, how to restore science to policymaking, how to reduce child poverty. At an event in eastern Iowa, a woman rose to say that her four adult children were `stuck' in life, unable to afford what she had in the 1980s when a $10-an-hour job paid for rent, utilities and an annual vacation." text2 = "How can the federal government help our young people that want to do whats right and to get to those things that their parents worked so hard for? the voter asked. This is the conversation Mr. Buttigieg wants to have. Boasting a huge financial war chest but struggling in the polls, Mr. Buttigieg is now staking his presidential candidacy on Iowa, and particularly on connecting with rural white voters who want to talk about personal concerns more than impeachment. In doing so, Mr. Buttigieg is also trying to show how Democrats can win back counties that flipped from Barack Obama to Donald Trump in 2016 — there are more of them in Iowa than any other state — by focusing, he said, on “the things that are going to affect folks’ lives in a concrete way." tb1 = table(strsplit(tolower(text1),' ')) tb2 = table(strsplit(tolower(text2),' ')) pv = two.sample.pvals(tb1,tb2) HC.vals(pv$pv)
tb1 = table(c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,5,6,6,7,7,7)) tb2 = table(c(1,1,1,1,1,1,1,1,1,2,3,3,3,3,3,4,4,4,5,5,5,6)) PV = two.sample.pvals(tb1, tb2) # compute P-values HC.vals(PV$pv) # combine P-values using the HC statistics # Can be used to check similarity of word-frequencies in texts: text1 = "On the day House Democrats opened an impeachment inquiry of President Trump last week, Pete Buttigieg was being grilled by Iowa voters on other subjects: how to loosen the grip of the rich on government, how to restore science to policymaking, how to reduce child poverty. At an event in eastern Iowa, a woman rose to say that her four adult children were `stuck' in life, unable to afford what she had in the 1980s when a $10-an-hour job paid for rent, utilities and an annual vacation." text2 = "How can the federal government help our young people that want to do whats right and to get to those things that their parents worked so hard for? the voter asked. This is the conversation Mr. Buttigieg wants to have. Boasting a huge financial war chest but struggling in the polls, Mr. Buttigieg is now staking his presidential candidacy on Iowa, and particularly on connecting with rural white voters who want to talk about personal concerns more than impeachment. In doing so, Mr. Buttigieg is also trying to show how Democrats can win back counties that flipped from Barack Obama to Donald Trump in 2016 — there are more of them in Iowa than any other state — by focusing, he said, on “the things that are going to affect folks’ lives in a concrete way." tb1 = table(strsplit(tolower(text1),' ')) tb2 = table(strsplit(tolower(text2),' ')) pv = two.sample.pvals(tb1,tb2) HC.vals(pv$pv)
Compute HC stasitic directly from two one-way contingency tables.
stbl
– normalize using expeted p-value
(stbl==True) or observed (stbl==False)
alpha
– lower fraction of p-values to use
two.sample.HC(tb1, tb2, alpha = 0.45, stbl = TRUE)
two.sample.HC(tb1, tb2, alpha = 0.45, stbl = TRUE)
tb1 |
A one-way table with integer counts. |
tb2 |
A one-way table with integer counts. |
alpha |
A number between 0 and 1. |
stbl |
A boolean. |
A list containing the following fields:
HC
– Higher Critcism (HC) score
HC.star
– HC score corrected to finite sample
p
– p-value attaining HC
p.star
– p-value attaining HC.star
text1 = "On the day House Democrats opened an impeachment inquiry of President Trump last week, Pete Buttigieg was being grilled by Iowa voters on other subjects: how to loosen the grip of the rich on government, how to restore science to policymaking, how to reduce child poverty. At an event in eastern Iowa, a woman rose to say that her four adult children were `stuck' in life, unable to afford what she had in the 1980s when a $10-an-hour job paid for rent, utilities and an annual vacation." text2 = "How can the federal government help our young people that want to do whats right and to get to those things that their parents worked so hard for? the voter asked. This is the conversation Mr. Buttigieg wants to have. Boasting a huge financial war chest but struggling in the polls, Mr. Buttigieg is now staking his presidential candidacy on Iowa, and particularly on connecting with rural white voters who want to talk about personal concerns more than impeachment. In doing so, Mr. Buttigieg is also trying to show how Democrats can win back counties that flipped from Barack Obama to Donald Trump in 2016 — there are more of them in Iowa than any other state — by focusing, he said, on “the things that are going to affect folks’ lives in a concrete way." tb1 = table(strsplit(tolower(text1),' ')) tb2 = table(strsplit(tolower(text2),' ')) pv = two.sample.pvals(tb1,tb2) HC.vals(pv$pv)
text1 = "On the day House Democrats opened an impeachment inquiry of President Trump last week, Pete Buttigieg was being grilled by Iowa voters on other subjects: how to loosen the grip of the rich on government, how to restore science to policymaking, how to reduce child poverty. At an event in eastern Iowa, a woman rose to say that her four adult children were `stuck' in life, unable to afford what she had in the 1980s when a $10-an-hour job paid for rent, utilities and an annual vacation." text2 = "How can the federal government help our young people that want to do whats right and to get to those things that their parents worked so hard for? the voter asked. This is the conversation Mr. Buttigieg wants to have. Boasting a huge financial war chest but struggling in the polls, Mr. Buttigieg is now staking his presidential candidacy on Iowa, and particularly on connecting with rural white voters who want to talk about personal concerns more than impeachment. In doing so, Mr. Buttigieg is also trying to show how Democrats can win back counties that flipped from Barack Obama to Donald Trump in 2016 — there are more of them in Iowa than any other state — by focusing, he said, on “the things that are going to affect folks’ lives in a concrete way." tb1 = table(strsplit(tolower(text1),' ')) tb2 = table(strsplit(tolower(text2),' ')) pv = two.sample.pvals(tb1,tb2) HC.vals(pv$pv)
Align tables and use an exact binomial test (binom.test) on each feature. Alignment is done using "outer mergeing"; missing values are filled with zeros.
two.sample.pvals(tb1, tb2)
two.sample.pvals(tb1, tb2)
tb1 |
A one-way table with integer counts. |
tb2 |
A one-way table with integer counts. |
table of pair of counts per feature and a p-value associated with each pair.
tb1 = table(c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,5,6,6,7,7,7)) tb2 = table(c(1,1,1,1,1,1,1,1,1,2,3,3,3,3,3,4,4,4,5,5,5,6)) PV = two.sample.pvals(tb1, tb2) # compute P-values HC.vals(PV$pv) # use the Higher-Criticism to combine the P-values # for a global test # Can be used to check similarity of word-frequencies in texts: text1 = "On the day House Democrats opened an impeachment inquiry of President Trump last week, Pete Buttigieg was being grilled by Iowa voters on other subjects: how to loosen the grip of the rich on government, how to restore science to policymaking, how to reduce child poverty. At an event in eastern Iowa, a woman rose to say that her four adult children were `stuck' in life, unable to afford what she had in the 1980s when a $10-an-hour job paid for rent, utilities and an annual vacation." text2 = "How can the federal government help our young people that want to do whats right and to get to those things that their parents worked so hard for? the voter asked. This is the conversation Mr. Buttigieg wants to have. Boasting a huge financial war chest but struggling in the polls, Mr. Buttigieg is now staking his presidential candidacy on Iowa, and particularly on connecting with rural white voters who want to talk about personal concerns more than impeachment. In doing so, Mr. Buttigieg is also trying to show how Democrats can win back counties that flipped from Barack Obama to Donald Trump in 2016 — there are more of them in Iowa than any other state — by focusing, he said, on “the things that are going to affect folks’ lives in a concrete way." tb1 = table(strsplit(tolower(text1),' ')) tb2 = table(strsplit(tolower(text2),' ')) pv = two.sample.pvals(tb1,tb2) HC.vals(pv$pv)
tb1 = table(c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,5,6,6,7,7,7)) tb2 = table(c(1,1,1,1,1,1,1,1,1,2,3,3,3,3,3,4,4,4,5,5,5,6)) PV = two.sample.pvals(tb1, tb2) # compute P-values HC.vals(PV$pv) # use the Higher-Criticism to combine the P-values # for a global test # Can be used to check similarity of word-frequencies in texts: text1 = "On the day House Democrats opened an impeachment inquiry of President Trump last week, Pete Buttigieg was being grilled by Iowa voters on other subjects: how to loosen the grip of the rich on government, how to restore science to policymaking, how to reduce child poverty. At an event in eastern Iowa, a woman rose to say that her four adult children were `stuck' in life, unable to afford what she had in the 1980s when a $10-an-hour job paid for rent, utilities and an annual vacation." text2 = "How can the federal government help our young people that want to do whats right and to get to those things that their parents worked so hard for? the voter asked. This is the conversation Mr. Buttigieg wants to have. Boasting a huge financial war chest but struggling in the polls, Mr. Buttigieg is now staking his presidential candidacy on Iowa, and particularly on connecting with rural white voters who want to talk about personal concerns more than impeachment. In doing so, Mr. Buttigieg is also trying to show how Democrats can win back counties that flipped from Barack Obama to Donald Trump in 2016 — there are more of them in Iowa than any other state — by focusing, he said, on “the things that are going to affect folks’ lives in a concrete way." tb1 = table(strsplit(tolower(text1),' ')) tb2 = table(strsplit(tolower(text2),' ')) pv = two.sample.pvals(tb1,tb2) HC.vals(pv$pv)