Title: | An Analyzer of International Large Scale Assessments in Education |
---|---|
Description: | A fast way to analyze International Large-Scale Assessments (ILSAs) or any other dataset that includes replicated weights (Balanced Repeated Replication (BRR) weights, Jackknife replicate weights,...) and/or plausible values. 'Rrepest' contains functionalities that enable you to calculate basic statistics (means, correlations, etc.), frequencies, linear regression, or any other model already implemented in R that takes a data frame and weights as parameters. It also includes options to prepare the results for publication, following the table formatting standards of the Organization for Economic Cooperation and Development (OECD). |
Authors: | Rodolfo Ilizaliturri [aut, cre], Francesco Avvisati [aut], Francois Keslair [aut] |
Maintainer: | Rodolfo Ilizaliturri <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.3.0 |
Built: | 2024-10-31 21:10:49 UTC |
Source: | CRAN |
This dataset is a subset of the PISA 2018 database produced by the OECD for the countries of France, Italy, and Mexico.
data(df_pisa18)
data(df_pisa18)
A data frame with 1269 rows and 1120 variables
This dataset is a subset of the TALIS 2018 database produced by the OECD for the countries of France, Italy, and Mexico.
data(df_talis18)
data(df_talis18)
A data frame with 548 rows and 496 variables
Input the statistic wanted, target variable, and (optional) list of regressors
est(statistic, target, regressor = NULL)
est(statistic, target, regressor = NULL)
statistic |
(string vector) accepts "mean","var","std", "quant", "iqr", "freq", "lm", "corr", "cov" |
target |
(string vector) variable from where to get estimation |
regressor |
(string vector) independent variable for regression (1+) |
list of components to estimate for repest
est(c("mean","quant",.5,"corr"),c("pv1math","pv1read","Pv1SCIE"))
est(c("mean","quant",.5,"corr"),c("pv1math","pv1read","Pv1SCIE"))
Compute a DataFrame with frequency counts obtained from the sum of 'small.level' and 'big.level' after grouping, which can be used to calculate percentages.
grouped_sum_freqs(data, small.level, big.level, w = NULL)
grouped_sum_freqs(data, small.level, big.level, w = NULL)
data |
(dataframe) Data to analize |
small.level |
(string vector) All variables to get grouped sum that will sum up to 100 |
big.level |
(string vector) Must be fully contained in variables from small.level |
w |
(string) Numeric variable from which to get weights (optional) |
Dataframe with frequencies from the grouped sum of small.level and big.level used for getting percentages
grouped_sum_freqs(data = mtcars,small.level = c("cyl","am"),big.level = c("cyl"))
grouped_sum_freqs(data = mtcars,small.level = c("cyl","am"),big.level = c("cyl"))
Obtain a list as argument for groups to be evaluated in data
grp(group.name, column, cases)
grp(group.name, column, cases)
group.name |
(string) Name of the group to be displayed |
column |
(string) Column where the data is located |
cases |
(string vector) List of values to be put into the group |
list of groups to redefine group_name = column, values_in_group
append(grp("OECD Average","CNTRY",c("HUN","MEX")), grp("Europe","CNTRY",c("ITA","FRA")))
append(grp("OECD Average","CNTRY",c("HUN","MEX")), grp("Europe","CNTRY",c("ITA","FRA")))
Invert test column from Rrepest test = TRUE by name on "b." and "se." in the column name and by sign (*-1) on "b."
inv_test(data, name_index)
inv_test(data, name_index)
data |
(dataframe) df to analyze |
name_index |
(string/numeric) name or index for the estimate (b.) columns containing the data for the test in Rrepest) |
Dataframe cointaining inverted test column names for "b." and "se." according to Rrepest structure and column multiplied by (-1) for "b."
Number of observations valid for column x
n_obs_x(df, by, x, svy = NULL)
n_obs_x(df, by, x, svy = NULL)
df |
(dataframe) data to analyze |
by |
(string vector) column by which we'll break down results |
x |
(string) variable from where to get means |
svy |
(string) Possible projects to analyse.must be equal to ALL, IALS, IELS, PIAAC, PISA, PISA2015, PISAOOS, TALISSCH, TALISTCH |
Dataframe containing the number of observations valid for the target variable x
data(df_pisa18) data(df_talis18) n_obs_x(df = df_pisa18, by = "cnt",x = "wb173q03ha", svy = "PISA2015") n_obs_x(df = df_talis18, by = "cntry",x = "tt3g01", svy = "TALISTCH")
data(df_pisa18) data(df_talis18) n_obs_x(df = df_pisa18, by = "cnt",x = "wb173q03ha", svy = "PISA2015") n_obs_x(df = df_talis18, by = "cntry",x = "tt3g01", svy = "TALISTCH")
Estimates statistics using replicate weights (Balanced Repeated Replication (BRR) weights, Jackknife replicate weights,...), thus accounting for complex survey designs in the estimation of sampling variances. It is specially designed to be used with the data sets produced by the Organization for Economic Cooperation and Development (OECD), some of which include the Programme for International Student Assessment (PISA) and Teaching and Learning International Survey (TALIS) data sets, but works for all International Large Scale Assessments that use replicated weights. It also allows for analyses with multiply imputed variables (plausible values); where plausible values are included in a pvvarlist, the average estimator across plausible values is reported and the imputation error is added to the variance estimator.
Rrepest( data, svy, est, by = NULL, over = NULL, test = FALSE, user_na = FALSE, show_na = FALSE, flag = FALSE, fast = FALSE, tabl = FALSE, average = NULL, group = NULL, ... )
Rrepest( data, svy, est, by = NULL, over = NULL, test = FALSE, user_na = FALSE, show_na = FALSE, flag = FALSE, fast = FALSE, tabl = FALSE, average = NULL, group = NULL, ... )
data |
(dataframe) df to analyze |
svy |
(string) Possible projects to analyse.must be equal to ALL, IALS, IELS, PIAAC, PISA, PISA2015, PISAOOS, TALISSCH, TALISTCH . |
est |
(est function) that takes arguments stimate, target variable, regressor (optional for linear regressions) |
by |
(string vector) column in which we'll break down results |
over |
(vector string) columns over which to do analysis |
test |
(bool) TRUE: will calculate the difference between over variables |
user_na |
(bool) TRUE: show nature of user defined missing values for by.var |
show_na |
(bool) TRUE: include na in frequencies of x |
flag |
(bool) TRUE: Show NaN when there is not enough cases (or schools) |
fast |
(bool) TRUE: Only do 6 replicated weights |
tabl |
(bool) TRUE: Creates a flextable with all examples |
average |
(grp function) that takes arguments group.name, column, cases to create averages at the end of df |
group |
(grp function) that takes arguments group.name, column, cases to create groups at the end of df |
... |
Optional filtering parameters: i.e.: isced = 2, n.pvs = 5, cm.weights = c("finw",paste0("repw",1:22)) var.factor = 1/(0.5^2) z.score = qnorm(1-0.05/2) |
Dataframe containing estimation "b." and standard error "se." of desired processes
data(df_pisa18) Rrepest(data = df_pisa18, svy = "PISA2015", est = est("mean","AGE"), by = c("CNT"))
data(df_pisa18) Rrepest(data = df_pisa18, svy = "PISA2015", est = est("mean","AGE"), by = c("CNT"))
Compute weighted pearson correlation coefficient of two numeric vectors
weighted.corr(x, y, w, na.rm = TRUE)
weighted.corr(x, y, w, na.rm = TRUE)
x |
(numeric vector) variable from where to get correlation |
y |
(numeric vector) variable from where to get correlation |
w |
(numeric vector) vector of weights |
na.rm |
(bool) True: NAs be stripped before computation proceeds |
Pearson correlation coefficient
data(df_talis18) weighted.corr(x = df_talis18$T3STAKE, y = df_talis18$T3TEAM, w = df_talis18$TCHWGT)
data(df_talis18) weighted.corr(x = df_talis18$T3STAKE, y = df_talis18$T3TEAM, w = df_talis18$TCHWGT)
Multivariate Correlation and Covariance
weighted.corr.cov.n( data, x, w = rep(1, length(data[x[1]])), corr = TRUE, na.rm = TRUE )
weighted.corr.cov.n( data, x, w = rep(1, length(data[x[1]])), corr = TRUE, na.rm = TRUE )
data |
(dataframe) data to analyze |
x |
(vector string) variables names from where to get correlation/covariance |
w |
(string) weight name |
corr |
(bool) True: get correlation. False: get covariance |
na.rm |
(bool) True: NAs be stripped before computation proceeds |
Dataframe containing 2 Choose length(x) columns with each bivariate correlation/covariance
data(df_talis18) weighted.corr.cov.n(df_talis18,c("T3STAKE","T3TEAM","T3STUD"),"TCHWGT")
data(df_talis18) weighted.corr.cov.n(df_talis18,c("T3STAKE","T3TEAM","T3STUD"),"TCHWGT")
Compute weighted covariance coefficient of two numeric vectors
weighted.cov(x, y, w, na.rm = TRUE)
weighted.cov(x, y, w, na.rm = TRUE)
x |
(numeric vector) variable from where to get covariance |
y |
(numeric vector) variable from where to get covariance |
w |
(numeric vector) vector of weights |
na.rm |
(bool) True: NAs be stripped before computation proceeds |
Pearson correlation coefficient
data(df_talis18) weighted.cov(x = df_talis18$T3STAKE, y = df_talis18$T3TEAM, w = df_talis18$TCHWGT)
data(df_talis18) weighted.cov(x = df_talis18$T3STAKE, y = df_talis18$T3TEAM, w = df_talis18$TCHWGT)
Compute interquantile range
weighted.iqr(x, w = rep(1, length(x)), rang = c(0.25, 0.75))
weighted.iqr(x, w = rep(1, length(x)), rang = c(0.25, 0.75))
x |
(numeric vector) variable from where to get quantiles |
w |
(numeric vector) vector of weights |
rang |
(numeric vector) two numbers indicating the range of the quantiles |
Interquantile range
weighted.iqr(x = mtcars$mpg, w = mtcars$wt, rang = c(.5,.9))
weighted.iqr(x = mtcars$mpg, w = mtcars$wt, rang = c(.5,.9))
Computation of weighted quantiles
weighted.quant(x, w = rep(1, length(x)), q = 0.5)
weighted.quant(x, w = rep(1, length(x)), q = 0.5)
x |
(numeric vector) variable from where to get quantiles |
w |
(numeric vector) vector of weights |
q |
(numeric vector) From 0 to 1 (exclusive) for the quantile desired |
Weighted quantile of a numeric vector
weighted.quant(x = mtcars$mpg, w = mtcars$wt, q = seq(.1,.9,.1))
weighted.quant(x = mtcars$mpg, w = mtcars$wt, q = seq(.1,.9,.1))
Calculate the standard deviation of a numeric vector
weighted.std(x, w, na.rm = TRUE)
weighted.std(x, w, na.rm = TRUE)
x |
(numeric vector) variable to analyze |
w |
(numeric vector) vector of weights |
na.rm |
(bool) if TRUE remove missing values. |
Scalar with Variance or Standard Deviation
data(df_talis18) weighted.std(df_talis18$TT3G02, df_talis18$TRWGT1)
data(df_talis18) weighted.std(df_talis18$TT3G02, df_talis18$TRWGT1)
Calculate the weighted variance numeric vector
weighted.var(x, w, na.rm = TRUE)
weighted.var(x, w, na.rm = TRUE)
x |
(numeric vector) variable to analyze |
w |
(numeric vector) vector of weights |
na.rm |
(bool) if TRUE remove missing values. |
Scalar with Variance or Standard Deviation
data(df_talis18) weighted.var(df_talis18$TT3G02, df_talis18$TRWGT1)
data(df_talis18) weighted.var(df_talis18$TT3G02, df_talis18$TRWGT1)