Title: | Draw Samples with the Desired Properties from a Data Set |
---|---|
Description: | A tool to sample data with the desired properties.Samples can be drawn by purposive sampling with determining distributional conditions, such as deviation from normality (skewness and kurtosis), and sample size in quantitative research studies. For purposive sampling, a researcher has something in mind and participants that fit the purpose of the study are included (Etikan,Musa, & Alkassim, 2015) <doi:10.11648/j.ajtas.20160501.11>.Purposive sampling can be useful for answering many research questions (Klar & Leeper, 2019) <doi:10.1002/9781119083771.ch21>. |
Authors: | Kubra Atalay Kabasakal [aut, cre] , Huseyin Yıldız [ctb] |
Maintainer: | Kubra Atalay Kabasakal <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.1 |
Built: | 2024-12-25 06:51:02 UTC |
Source: | CRAN |
draw_sample, functions take a sample of the specified sample size,skewness, and kurtosis form a data set (dist)with or without resampling. Fleishman's power method (doi:10.1007/BF02293811) was used for the desired skewness and kurtosis level. Therefore, the coefficient of skewness can be chosen between 0 and 3.6. Although the kurtosis coefficient varies for each skewness coefficient and varies from -1.2 and 20. If convenient kurtosis and skew values are not provided, no solutions can be found and an error is given.
Maintainer: Kubra Atalay Kabasakal [email protected] (ORCID)
Other contributors:
Huseyin Yıldız [email protected] (ORCID) [contributor]
Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi:10.1007/BF02293811.
Atalay Kabasakal, K. & Gunduz, T . (2020). Drawing a Sample with Desired Properties from Population in R Package “drawsample”.Journal of Measurement and Evaluation in Education and Psychology,11(4),405-429. doi:10.21031/epod.790449
Useful links:
This table includes Fleishman's Power Method Transformation constants.
constants_table
constants_table
A data.frame
with 5 columns, which are
The skewness value
The standardized kurtosis value
Outcome that is based on Skew,Kurtosis
Outcome that is based on Skew,Kurtosis
Outcome that is based on Skew,Kurtosis
Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi:10.1007/BF02293811.
Fialkowski, A. C. (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. R package version 0.2.2. Retrieved from https://cran.r-project.org/web/packages/SimMultiCorrData/index.html
# First 6 rows of the table data(constants_table) head(constants_table)
# First 6 rows of the table data(constants_table) head(constants_table)
A function to sample data with desired properties.
draw_sample( dist, n, skew, kurts, replacement = FALSE, save.output = FALSE, output_name = c("sample", "default") )
draw_sample( dist, n, skew, kurts, replacement = FALSE, save.output = FALSE, output_name = c("sample", "default") )
dist |
data frame:consists of id and scores with no missing |
n |
numeric: desired sample size |
skew |
numeric: the skewness value |
kurts |
numeric: the kurtosis value |
replacement |
logical:Sample with or without replacement? (default is FALSE). |
save.output |
logical: should the output be saved into a text file? (default is FALSE). |
output_name |
character: a vector of two components. The first component is the name of the output file, user can change the second component. |
The execution of the function may take some time since it tries to obtain the specified value for skewness and kurtosis.
This function returns a list
including following:
a matrix: Descriptive statistics of the given data, the reference vector and the sample.
a data frame: The id's and scores of the sample
graph: Histograms for the “data” and the “sample”
Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi:10.1007/BF02293811.
Fialkowski, A. C. (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple #' Variable Types. R package version 0.2.2. Retrieved from https://cran.r-project.org/web/packages/SimMultiCorrData/index.html
Atalay Kabasakal, K. & Gunduz, T. (2020). Drawing a Sample with Desired Properties from Population in R Package “drawsample”.Journal of Measurement and Evaluation in Education and Psychology,11(4),405-429. doi:10.21031/epod.790449
# Example data provided with package data(example_data) # First 6 rows of the example_data head(example_data) # Draw a sample based on Score_1(from negatively skewed to normal) output1 <- draw_sample(dist=example_data[,c(1,2)],n=200,skew = 0,kurts = 0, save.output=FALSE) # Histogram of the reference data set # descriptive statistics of the given data,reference data, and drawn sample output1$desc # First 6 rows of the drawn sample head(output1$sample) # Histogram of the given data set and drawn sample output1$graph ## Not run: # Draw a sample based on Score_2 (from negatively skewed to positively skewed) # draw_sample(dist=example_data[,c(1,3)],n=200,skew = 1,kurts = 1, # output_name = c("sample", "1")) # Draw a sample based on Score_2 (from negatively skewed to positively skewed # with replacement) # draw_sample(dist=example_data[,c(1,3)],n=200,skew = 0.5,kurts = 0.4, # replacement=TRUE,output_name = c("sample", "2")) ## End(Not run)
# Example data provided with package data(example_data) # First 6 rows of the example_data head(example_data) # Draw a sample based on Score_1(from negatively skewed to normal) output1 <- draw_sample(dist=example_data[,c(1,2)],n=200,skew = 0,kurts = 0, save.output=FALSE) # Histogram of the reference data set # descriptive statistics of the given data,reference data, and drawn sample output1$desc # First 6 rows of the drawn sample head(output1$sample) # Histogram of the given data set and drawn sample output1$graph ## Not run: # Draw a sample based on Score_2 (from negatively skewed to positively skewed) # draw_sample(dist=example_data[,c(1,3)],n=200,skew = 1,kurts = 1, # output_name = c("sample", "1")) # Draw a sample based on Score_2 (from negatively skewed to positively skewed # with replacement) # draw_sample(dist=example_data[,c(1,3)],n=200,skew = 0.5,kurts = 0.4, # replacement=TRUE,output_name = c("sample", "2")) ## End(Not run)
A Function to sample data close to desired characteristics with individual responses.
draw_sample_ir( dist, n, skew, kurts, replacement = FALSE, col_id = 1, col_total = numeric(), save.output = FALSE, output_name = c("sample", "1") )
draw_sample_ir( dist, n, skew, kurts, replacement = FALSE, col_id = 1, col_total = numeric(), save.output = FALSE, output_name = c("sample", "1") )
dist |
data frame:consists of id and scores with no missing |
n |
numeric: desired sample size |
skew |
numeric: the skewness value |
kurts |
numeric: the kurtosis value |
replacement |
logical:Sample with or without replacement? (default is FALSE). |
col_id |
index of column ID's |
col_total |
index of column total score |
save.output |
logical: should the output be saved into a text file? (Default is FALSE). |
output_name |
character: a vector of two components. The first component is the name of the output file, user can change the second component. |
The execution of the function may take some time since it tries to obtain the specified value for skewness and kurtosis.
This function returns a list
including following:
a matrix: Descriptive statistics of the given data, the reference vector and the sample.
a data frame: The id's and individual response of the sample.
graph: Histograms for the “data” and the “sample”
Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi:10.1007/BF02293811.
Fialkowski, A. C. (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple #' Variable Types. R package version 0.2.2. Retrieved from https://cran.r-project.org/web/packages/SimMultiCorrData/index.html
Atalay Kabasakal, K. & Gunduz, T. (2020). Drawing a Sample with Desired Properties from Population in R Package “drawsample”.Journal of Measurement and Evaluation in Education and Psychology,11(4),405-429. doi:10.21031/epod.790449
## Not run: # Example data provided with package data(likert_example) # First 6 rows of the example_data head(likert_example) # Draw a sample based on total(from flattened to normal) output3 <- draw_sample_ir(dist=likert_example,n=200,skew = 1,kurts = 1.2, col_id=1,col_total=7,save.output = FALSE) # Histogram of the reference data set # descriptive statistics of the given data,reference data, and drawn sample output3$desc # First 6 rows of the drawn sample head(output3$sample) # Histogram of the given data set and drawn sample output3$graph # Draw a sample based on total(from flattened to normal) draw_sample_ir(dist=likert_example,n=200,skew = 0.5,kurts =0.5, col_id=1,col_total=7,save.output = TRUE, output_name = c("sample", "3")) ## End(Not run)
## Not run: # Example data provided with package data(likert_example) # First 6 rows of the example_data head(likert_example) # Draw a sample based on total(from flattened to normal) output3 <- draw_sample_ir(dist=likert_example,n=200,skew = 1,kurts = 1.2, col_id=1,col_total=7,save.output = FALSE) # Histogram of the reference data set # descriptive statistics of the given data,reference data, and drawn sample output3$desc # First 6 rows of the drawn sample head(output3$sample) # Histogram of the given data set and drawn sample output3$graph # Draw a sample based on total(from flattened to normal) draw_sample_ir(dist=likert_example,n=200,skew = 0.5,kurts =0.5, col_id=1,col_total=7,save.output = TRUE, output_name = c("sample", "3")) ## End(Not run)
A Function to sample data close to desired characteristics - nearest
draw_sample_n( dist, n, skew, kurts, location = 0, delta_var = 0, save.output = FALSE, output_name = c("sample", "default") )
draw_sample_n( dist, n, skew, kurts, location = 0, delta_var = 0, save.output = FALSE, output_name = c("sample", "default") )
dist |
data frame:consists of id and scores with no missing |
n |
numeric: desired sample size |
skew |
numeric: the skewness value |
kurts |
numeric: the kurtosis value |
location |
numeric: the value for adjusting mean (default is 0). |
delta_var |
numeric: the value for adjusting variance (default is 0). |
save.output |
logical: should the output be saved into a text file? (Default is FALSE). |
output_name |
character: a vector of two components. The first component is the name of the output file, user can change the second component. |
The desired skewness and kurtosis values cannot be met while the function
execution is faster. The attributes of kurtosis are in doubt.
This is because the range of kurtosis is greater than the skewness.
For location
values can be entered to position the midpoint or mean of the
distribution differently. For delta_var
the value can be entered for
how much will increase or decrease the variability of reference distribution.
In other words, the reference distribution is generated as the standard normal distribution,
unless the user changes the default values of the location
and delta_var
arguments.
This function returns a list
including following:
a matrix: Descriptive statistics of the given data, the reference vector and the sample.
a data frame: The id's and scores of the sample
graph: Histograms for the “data” and the “sample”
Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi:10.1007/BF02293811.
Fialkowski, A. C. (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple #' Variable Types. R package version 0.2.2. Retrieved from https://cran.r-project.org/web/packages/SimMultiCorrData/index.html
# Example data provided with package data(example_data) # Draw a sample based on Score_1 output2 <- draw_sample_n(dist=example_data[,c(1,2)],n=200,skew = 0, kurts = 0, location=0, delta_var=0,save.output=FALSE) # Histogram of the reference data set # descriptive statistics of the given data,reference data, and drawn sample output2$desc # First 6 rows of the drawn sample head(output2$sample) # Histogram of the given data set and drawn sample output2$graph ## Not run: # Draw a sample based on Score_2 (location par) # draw_sample_n(dist=example_data[,c(1,3)],n=200,skew = 1,kurts = 1,location=-0.5,delta_var=0, # save.output=TRUE, output_name = c("sample", "2")) # Draw a sample based on Score_2 (delta_var par) # draw_sample_n(dist=example_data[,c(1,3)],n=200,skew = 0.5,kurts = 0.4,location=0,delta_var=0.3, # save.output=TRUE, output_name = c("sample", "3")) ## End(Not run)
# Example data provided with package data(example_data) # Draw a sample based on Score_1 output2 <- draw_sample_n(dist=example_data[,c(1,2)],n=200,skew = 0, kurts = 0, location=0, delta_var=0,save.output=FALSE) # Histogram of the reference data set # descriptive statistics of the given data,reference data, and drawn sample output2$desc # First 6 rows of the drawn sample head(output2$sample) # Histogram of the given data set and drawn sample output2$graph ## Not run: # Draw a sample based on Score_2 (location par) # draw_sample_n(dist=example_data[,c(1,3)],n=200,skew = 1,kurts = 1,location=-0.5,delta_var=0, # save.output=TRUE, output_name = c("sample", "2")) # Draw a sample based on Score_2 (delta_var par) # draw_sample_n(dist=example_data[,c(1,3)],n=200,skew = 0.5,kurts = 0.4,location=0,delta_var=0.3, # save.output=TRUE, output_name = c("sample", "3")) ## End(Not run)
A function to sample data with desired properties.
draw_sample_n_ir( dist, n, skew, kurts, location = 0, delta_var = 0, col_id = 1, col_total = numeric(), save.output = FALSE, output_name = c("sample", "default") )
draw_sample_n_ir( dist, n, skew, kurts, location = 0, delta_var = 0, col_id = 1, col_total = numeric(), save.output = FALSE, output_name = c("sample", "default") )
dist |
data frame:consists of id and scores with no missing |
n |
numeric: desired sample size |
skew |
numeric: the skewness value |
kurts |
numeric: the kurtosis value |
location |
numeric: the value for adjusting mean (default is 0). |
delta_var |
numeric: the value for adjusting variance (default is 0). |
col_id |
index of column ID's |
col_total |
index of column total score |
save.output |
logical: should the output be saved into a text file? (Default is FALSE). |
output_name |
character: a vector of two components. The first component is the name of the output file, user can change the second component. |
The desired skewness and kurtosis values cannot be met while the function execution is faster. The attributes of kurtosis are in doubt. This is because the range of kurtosis is greater than the skewness.
This function returns a list
including following:
a matrix: Descriptive statistics of the given data, the reference vector and the sample.
a data frame: The id's and scores of the sample
graph: Histograms for the “data” and the “sample”
Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi:10.1007/BF02293811.
Fialkowski, A. C. (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple #' Variable Types. R package version 0.2.2. Retrieved from https://cran.r-project.org/web/packages/SimMultiCorrData/index.html
Atalay Kabasakal, K. & Gunduz, T. (2020). Drawing a Sample with Desired Properties from Population in R Package “drawsample”.Journal of Measurement and Evaluation in Education and Psychology,11(4),405-429. doi:10.21031/epod.790449
# Example data provided with package data(likert_example) # First 6 rows of the example_data head(likert_example) # Draw a sample based on Score_1(from negatively skewed to normal) output4 <- draw_sample_n_ir(dist=likert_example,n=200,skew = 0,kurts = 0, location= 0,delta_var = 0, col_id=1,col_total=7,save.output=FALSE) # Histogram of the reference data set # descriptive statistics of the given data,reference data, and drawn sample output4$desc # First 6 rows of the drawn sample head(output4$sample) # Histogram of the given data set and drawn sample output4$graph ## Not run: output4 <- draw_sample_n_ir(dist=likert_example,n=200,skew = 0.5,kurts = 0.5, location= 0,delta_var = 0, col_id=1,col_total=7,save.output=TRUE, output_name = c("sample", "1")) ## End(Not run)
# Example data provided with package data(likert_example) # First 6 rows of the example_data head(likert_example) # Draw a sample based on Score_1(from negatively skewed to normal) output4 <- draw_sample_n_ir(dist=likert_example,n=200,skew = 0,kurts = 0, location= 0,delta_var = 0, col_id=1,col_total=7,save.output=FALSE) # Histogram of the reference data set # descriptive statistics of the given data,reference data, and drawn sample output4$desc # First 6 rows of the drawn sample head(output4$sample) # Histogram of the given data set and drawn sample output4$graph ## Not run: output4 <- draw_sample_n_ir(dist=likert_example,n=200,skew = 0.5,kurts = 0.5, location= 0,delta_var = 0, col_id=1,col_total=7,save.output=TRUE, output_name = c("sample", "1")) ## End(Not run)
Multiple Sample Selection
draw_sample_rep( dist, n, rep = 1, skew, kurts, replacement = TRUE, col_id = 1, col_total = numeric(), exact = FALSE )
draw_sample_rep( dist, n, rep = 1, skew, kurts, replacement = TRUE, col_id = 1, col_total = numeric(), exact = FALSE )
dist |
data frame:consists of id and scores with no missing |
n |
numeric: desired sample size |
rep |
numeric: replication |
skew |
numeric: the skewness value |
kurts |
numeric: the kurtosis value |
replacement |
logical:Sample with or without replacement? (default is FALSE). |
col_id |
index of column ID's |
col_total |
index of column total score |
exact |
default is FALSE conduct draw_sample_n_ir function, it is faster and nearest version of draw_sample_ir function. |
This function returns a list
including following:
a matrix: Descriptive statistics of the given data, the reference vector and the sample.
a data frame: The id's and scores of the sample
graph: Histograms for the “data” and the “sample”
# Example data provided with package data(likert_example) # First 6 rows of the example_data head(likert_example) # Draw three samples based on Score_1(from negatively skewed to normal) # This example takes considerable computation time. samples <- draw_sample_rep(dist=likert_example,n=200,rep=3,skew=0, kurts=0,replacement =TRUE, col_id = 1, col_total = numeric(), exact = FALSE) # to get first sample samples$sample[[1]] # to get second sample samples$sample[[2]] ## Not run: # to export 10 samples for(i in 1:3){ write.csv(samples$sample[[i]],row.names = FALSE,paste("sample_",i,".csv",sep="")) } ## End(Not run)
# Example data provided with package data(likert_example) # First 6 rows of the example_data head(likert_example) # Draw three samples based on Score_1(from negatively skewed to normal) # This example takes considerable computation time. samples <- draw_sample_rep(dist=likert_example,n=200,rep=3,skew=0, kurts=0,replacement =TRUE, col_id = 1, col_total = numeric(), exact = FALSE) # to get first sample samples$sample[[1]] # to get second sample samples$sample[[2]] ## Not run: # to export 10 samples for(i in 1:3){ write.csv(samples$sample[[i]],row.names = FALSE,paste("sample_",i,".csv",sep="")) } ## End(Not run)
Performing package functions with user friendly 'shiny' interface.
draw_sample_shiny()
draw_sample_shiny()
## Not run: # if(interactive()){ ## Run this code for launching the 'shiny' application # draw_sample_shiny() # } # ## End(Not run)
## Not run: # if(interactive()){ ## Run this code for launching the 'shiny' application # draw_sample_shiny() # } # ## End(Not run)
The example data set is made of 500 subjects ids and total scores from two different tests.
data(example_data)
data(example_data)
A data.frame
with 3 columns, which are
students' id
Scores of test 1
Scores of test 2
# First 6 rows of the example_data data(example_data) head(example_data)
# First 6 rows of the example_data data(example_data) head(example_data)
The example data set is made of 6669 subjects, 7 variables
data(likert_example)
data(likert_example)
A data.frame
with 7 columns, which are
country ID
response of item_1
response of item_2
response of item_3
response of item_4
response of item_5
total_score of five items
# First 6 rows of the likert_example data(likert_example) head(likert_example)
# First 6 rows of the likert_example data(likert_example) head(likert_example)