Title: | Modern Nonparametric Tools for Two-Sample Quantile and Distribution Comparisons |
---|---|
Description: | Allows practitioners to determine (i) if two univariate distributions (which can be continuous, discrete, or even mixed) are equal, (ii) how two distributions differ (shape differences, e.g., location, scale, etc.), and (iii) where two distributions differ (at which quantiles), all using nonparametric LP statistics. The primary reference is Jungreis, D. (2019, Technical Report). |
Authors: | David Jungreis, Subhadeep Mukhopadhyay |
Maintainer: | David Jungreis <[email protected]> |
License: | GPL-2 |
Version: | 3.0 |
Built: | 2024-11-10 06:33:55 UTC |
Source: | CRAN |
Allows practitioners to determine (i) if two univariate distributions (which can be continuous, discrete, or even mixed) are equal, (ii) how two distributions differ (shape differences, e.g., location, scale, etc.), and (iii) where two distributions differ (at which quantiles), all using nonparametric LP statistics.
David Jungreis, Subhadeep Mukhopadhyay
Maintainer: David Jungreis <[email protected]>
Jungreis, D., (2019) "Unification of Continuous, Discrete, and Mixed Distribution Two-Sample Testing with Inferences in the Quantile Domain"
Mukhopadhyay, S., (2013) "Nonparametric Inference for High Dimensional Data,"" Ph.D. diss., Texas A&M University, College Station, Texas.
Mukhopadhyay, S. and Parzen, E. (2014), "LP Approach to Statistical Modeling", arXiv:1405.2601.
x <- c(rep(0,200),rep(1,200)) y <- c(rnorm(200,0,1),rnorm(200,1,1)) L <- LP.QDC(x,y) L$pval
x <- c(rep(0,200),rep(1,200)) y <- c(rnorm(200,0,1),rnorm(200,1,1)) L <- LP.QDC(x,y) L$pval
The data come from Jackson's (2009) depression data, used by Wilcox (2014).
data(Depression)
data(Depression)
A data frame with 372 observations on the following 2 variables.
x
A binary indicator variable: 0 for control, 1 for intervention (received therapy)
y
The response variable: CESD score (higher means more depressed)
Jackson, J., Mandel, D., Blanchard, J., Carlson, M., Cherry, B., Azen, S., Chou, C.P.,Jordan-Marsh, M., Forman, T., White, B., et al. (2009), "Confronting challenges in intervention research with ethnically diverse older adults: the USC Well Elderly II trial," Clinical Trials, 6, 90-101.
Wilcox, R. R., Erceg-Hurn, D. M., Clark, F., and Carlson, M. (2014), "Comparing two independent groups via the lower and upper quantiles," Journal of Statistical Computation and Simulation, 84, 1543-1551.
data(Depression) ## maybe str(Depression) y <- Depression[,2] x <- Depression[,1] hist(y[x==1])
data(Depression) ## maybe str(Depression) y <- Depression[,2] x <- Depression[,1] hist(y[x==1])
These data come from LaLonde's (1986) National Supported Work Demonstration (NSW) Data (Dehejia-Wahha Sample (1999)), used by Firpo (2007).
data(Earnings1978)
data(Earnings1978)
A data frame with 445 observations on the following 2 variables.
x
A binary indicator variable: 0 for control, 1 for intervention (received job training)
y
The response variable: earnings in 1978
http://users.nber.org/~rdehejia/data/nswdata2.html
Dehejia, R. H. and Wahba, S. (1999), "Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs," Journal of the American Statistical Association, 94, 1053-1062.
Firpo, S. (2007), "Efficient semiparametric estimation of quantile treatment effects," Econometrica, 75, 259-276.
LaLonde, R. J. (1986), "Evaluating the econometric evaluations of training programs with experimental data," The American Economic Review, 604-620.
data(Earnings1978) ## maybe str(Earnings1978) y <- Earnings1978[,2] x <- Earnings1978[,1] hist(y[x==1])
data(Earnings1978) ## maybe str(Earnings1978) y <- Earnings1978[,2] x <- Earnings1978[,1] hist(y[x==1])
Given a random sample from X (which can be discrete, continuous, or even mixed), this function computes the empirical LP-basis functions.
eLP.poly(x,m)
eLP.poly(x,m)
x |
The random samples |
m |
Number of basis functions to compute |
LP basis functions
Subhadeep Mukhopadhyay
Jungreis, D., (2019) "Unification of Continuous, Discrete, and Mixed Distribution Two-Sample Testing with Inferences in the Quantile Domain"
Mukhopadhyay, S. and Parzen, E. (2014), "LP Approach to Statistical Modeling", arXiv:1405.2601.
x <- c(rep(0,200),rep(1,200)) m <- 6 eLP.poly(x,m)
x <- c(rep(0,200),rep(1,200)) m <- 6 eLP.poly(x,m)
These data come from Gneezy's (2006) fundraising experiment, on which Goldman (2018) performed quantile treatment effect analysis. These data correspond to the "pre-lunch" period.
data(Fundraising)
data(Fundraising)
A data frame with 23 observations on the following 2 variables.
x
A binary indicator variable: 0 for control, 1 for intervention (gift wage)
y
The response variable: dollars raised
Gneezy, U. and List, J. A. (2006), "Putting behavioral economics to work: Testing for gift exchange in labor markets using field experiments," Econometrica, 74, 1365-1384.
Gneezy, U. and List, J. A. (2006), "Putting behavioral economics to work: Testing for gift exchange in labor markets using field experiments," Econometrica, 74, 1365-1384.
Goldman, M. and Kaplan, D. M. (2018), "Comparing distributions by multiple testing across quantiles or CDF values," Journal of Econometrics, Volume 206, Issue 1, 143-166.
data(Fundraising) ## maybe str(Fundraising) y <- Fundraising[,2] x <- Fundraising[,1] hist(y[x==1])
data(Fundraising) ## maybe str(Fundraising) y <- Fundraising[,2] x <- Fundraising[,1] hist(y[x==1])
This function runs the entire quantile and distribution comparison, giving LP comoments, LP coefficients, LPINFOR test statistic, p-value, estimated comparison density with null-band, and intervals where the comparison density is above or below the null band
LP.QDC(x,y,m=6,smooth="TRUE",method="BIC",alpha=0.05, B=1000,spar=NA,plot="TRUE",inset=-0.2)
LP.QDC(x,y,m=6,smooth="TRUE",method="BIC",alpha=0.05, B=1000,spar=NA,plot="TRUE",inset=-0.2)
x |
Indicator variable denoting group membership |
y |
Response variable with measured values |
m |
Number of LP comoments and LP coefficients to be calculated, default: 6 |
smooth |
If smoothing should be applied, default: TRUE |
method |
Smoothing method as AIC or BIC, default: BIC |
alpha |
Alpha-level for confidence bands, default: 0.05 |
B |
Number of permutations of the x labels, default: 1000 |
spar |
"spar" in "smooth.spline" of upper and lower bounds of confidence bands, default: NA, let smooth.splines function figure it out |
plot |
Should plotting be performed, default: TRUE |
inset |
Graphical parameter that controls where the color legend is plotted below x-axis, default: -0.2 |
A list containing:
band |
y-values of the upper and lower bounds of the confidence band |
d.hat |
y-values of the comparison density |
sparL |
"spar" value in "smooth.spline" of lower bound of the null band |
sparU |
"spar" value in "smooth.spline" of upper bound of the null band |
out.above |
Quantile intervals where group 1 dominates the pooled distribution |
out.below |
Quantile intervals where group 0 dominates the pooled distribution |
LP.comoment.0 |
LP comoments, conditioned on X=0 |
LP.coef.0 |
LP coefficients, conditioned on X=0 |
LP.comoment.1 |
LP comoments, conditioned on X=1 |
LP.coef.1 |
LP coefficients, conditioned on X=1 |
LPINFOR |
Test statistics value |
pval |
The p-value for testing equality of two distributions F0=F1 |
David Jungreis
Subhadeep Mukhopadhyay
Jungreis, D., (2019) "Unification of Continuous, Discrete, and Mixed Distribution Two-Sample Testing with Inferences in the Quantile Domain"
Mukhopadhyay, S. and Parzen, E. (2014), "LP Approach to Statistical Modeling", arXiv:1405.2601.
x <- c(rep(0,200),rep(1,200)) y <- c(rnorm(200,0,1),rnorm(200,1,1)) L <- LP.QDC(x,y) L$pval
x <- c(rep(0,200),rep(1,200)) y <- c(rnorm(200,0,1),rnorm(200,1,1)) L <- LP.QDC(x,y) L$pval
This function computes LP comoments, LP coefficients, LPINFOR test statistic, and the corresponding p-value of for testing equality of two distributions.
LP.XY(x,y,m=6,smooth="TRUE",method="BIC")
LP.XY(x,y,m=6,smooth="TRUE",method="BIC")
x |
Indicator variable denoting group membership |
y |
Response variable with measured values |
m |
Number of LP comoments and LP coefficients to be calculated, default: 6 |
smooth |
If smoothing should be applied, default: TRUE |
method |
Smoothing method, default: BIC |
A list containing:
LP.comoment.0 |
LP comoments, conditioned on X=0 |
LP.coef.0 |
LP coefficients, conditioned on X=0 |
LP.comoment.1 |
LP comoments, conditioned on X=1 |
LP.coef.1 |
LP coefficients, conditioned on X=1 |
LPINFOR |
Test statistics value |
pval |
The p-value for testing equality of two distributions F0=F1 |
Subhadeep Mukhopadhyay
David Jungreis
Jungreis, D., (2019) "Unification of Continuous, Discrete, and Mixed Distribution Two-Sample Testing with Inferences in the Quantile Domain"
Mukhopadhyay, S. and Parzen, E. (2014), "LP Approach to Statistical Modeling", arXiv:1405.2601.
x <- c(rep(0,200),rep(1,200)) y <- c(rnorm(200,0,1),rnorm(200,1,1)) L <- LP.XY(x,y) L$pval
x <- c(rep(0,200),rep(1,200)) y <- c(rnorm(200,0,1),rnorm(200,1,1)) L <- LP.XY(x,y) L$pval
These data come from Banerjee's (2015) informal borrowing observations.
data(Microfinance)
data(Microfinance)
A data frame with 6811 observations on the following 2 variables.
x
A binary indicator variable: 0 for control, 1 for intervention (access to microfinance)
y
The response variable: rupees informally borrowed
https://www.aeaweb.org/articles?id=10.1257/app.20130533
Banerjee, A., Duflo, E., Glennerster, R., and Kinnan, C. (2015), "The miracle of microfinance? Evidence from a randomized evaluation," American Economic Journal: Applied Economics, 7, 22-53.
data(Microfinance) ## maybe str(Microfinance) y <- Microfinance[,2] x <- Microfinance[,1] # Remove the 0s (as Banerjee (2015) appears to have done) ind <- which(y==0) x <- x[-ind] y <- y[-ind] hist(y[x==0])
data(Microfinance) ## maybe str(Microfinance) y <- Microfinance[,2] x <- Microfinance[,1] # Remove the 0s (as Banerjee (2015) appears to have done) ind <- which(y==0) x <- x[-ind] y <- y[-ind] hist(y[x==0])
These data come from Venturini's (2015) study of hospital costs for patients with smoking and non-smoking diseases.
data(NMES)
data(NMES)
A data frame with 9416 observations on the following 2 variables.
x
A binary indicator variable: 0 for non-smoking disease, 1 for smoking disease
y
The response variable: cost of a hospital stay, in dollars
Dominici, F., Cope, L., Naiman, D. Q., and Zeger, S. L. (2005), "Smooth quantile ratio estimation," Biometrika, 92, 543-557.
Dominici, F. and Zeger, S. L. (2005), "Smooth quantile ratio estimation with regression: estimating medical expenditures for smoking-attributable diseases," Biostatistics, 6, 505-519.
Johnson, E., Dominici, F., Griswold, M., and Zeger, S. L. (2003), "Disease cases and their medical costs attributable to smoking: an analysis of the national medical expenditure survey," Journal of Econometrics, 112, 135-151.
Venturini, S., Dominici, F., Parmigiani, G., et al. (2015), "Generalized quantile treatment effect: A flexible Bayesian approach using quantile ratio smoothing," Bayesian Analysis, 10, 523-552.
data(NMES) ## maybe str(NMES) y <- NMES[,2] x <- NMES[,1] # Remove the 0s (as Venturini (2015) notes was necessary) ind <- which(y==0) x <- x[-ind] y <- y[-ind] hist(y[x==0])
data(NMES) ## maybe str(NMES) y <- NMES[,2] x <- NMES[,1] # Remove the 0s (as Venturini (2015) notes was necessary) ind <- which(y==0) x <- x[-ind] y <- y[-ind] hist(y[x==0])