Package 'drawsample'

Title: Draw Samples with the Desired Properties from a Data Set
Description: A tool to sample data with the desired properties.Samples can be drawn by purposive sampling with determining distributional conditions, such as deviation from normality (skewness and kurtosis), and sample size in quantitative research studies. For purposive sampling, a researcher has something in mind and participants that fit the purpose of the study are included (Etikan,Musa, & Alkassim, 2015) <doi:10.11648/j.ajtas.20160501.11>.Purposive sampling can be useful for answering many research questions (Klar & Leeper, 2019) <doi:10.1002/9781119083771.ch21>.
Authors: Kubra Atalay Kabasakal [aut, cre] , Huseyin Yıldız [ctb]
Maintainer: Kubra Atalay Kabasakal <[email protected]>
License: MIT + file LICENSE
Version: 1.0.1
Built: 2024-11-25 06:49:04 UTC
Source: CRAN

Help Index


Draw Samples with the Desired Properties from a Data Set

Description

draw_sample, functions take a sample of the specified sample size,skewness, and kurtosis form a data set (dist)with or without resampling. Fleishman's power method (doi:10.1007/BF02293811) was used for the desired skewness and kurtosis level. Therefore, the coefficient of skewness can be chosen between 0 and 3.6. Although the kurtosis coefficient varies for each skewness coefficient and varies from -1.2 and 20. If convenient kurtosis and skew values are not provided, no solutions can be found and an error is given.

Author(s)

Maintainer: Kubra Atalay Kabasakal [email protected] (ORCID)

Other contributors:

References

Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi:10.1007/BF02293811.

Atalay Kabasakal, K. & Gunduz, T . (2020). Drawing a Sample with Desired Properties from Population in R Package “drawsample”.Journal of Measurement and Evaluation in Education and Psychology,11(4),405-429. doi:10.21031/epod.790449

See Also

Useful links:


Fleishman's Power Method Transformation Constants

Description

This table includes Fleishman's Power Method Transformation constants.

Usage

constants_table

Format

A data.frame with 5 columns, which are

Skew

The skewness value

Kurtosis

The standardized kurtosis value

b

Outcome that is based on Skew,Kurtosis

c

Outcome that is based on Skew,Kurtosis

d

Outcome that is based on Skew,Kurtosis

References

Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi:10.1007/BF02293811.

Fialkowski, A. C. (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. R package version 0.2.2. Retrieved from https://cran.r-project.org/web/packages/SimMultiCorrData/index.html

See Also

find_constants

Examples

# First 6 rows of the table
data(constants_table)
head(constants_table)

Draw Samples with the Desired Properties from a Data Set

Description

A function to sample data with desired properties.

Usage

draw_sample(
  dist,
  n,
  skew,
  kurts,
  replacement = FALSE,
  save.output = FALSE,
  output_name = c("sample", "default")
)

Arguments

dist

data frame:consists of id and scores with no missing

n

numeric: desired sample size

skew

numeric: the skewness value

kurts

numeric: the kurtosis value

replacement

logical:Sample with or without replacement? (default is FALSE).

save.output

logical: should the output be saved into a text file? (default is FALSE).

output_name

character: a vector of two components. The first component is the name of the output file, user can change the second component.

Details

The execution of the function may take some time since it tries to obtain the specified value for skewness and kurtosis.

Value

This function returns a list including following:

  • a matrix: Descriptive statistics of the given data, the reference vector and the sample.

  • a data frame: The id's and scores of the sample

  • graph: Histograms for the “data” and the “sample”

References

Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi:10.1007/BF02293811.

Fialkowski, A. C. (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple #' Variable Types. R package version 0.2.2. Retrieved from https://cran.r-project.org/web/packages/SimMultiCorrData/index.html

Atalay Kabasakal, K. & Gunduz, T. (2020). Drawing a Sample with Desired Properties from Population in R Package “drawsample”.Journal of Measurement and Evaluation in Education and Psychology,11(4),405-429. doi:10.21031/epod.790449

Examples

# Example data provided with package
data(example_data)
# First 6 rows of the example_data
head(example_data)
# Draw a sample based on Score_1(from negatively skewed to normal)
output1 <- draw_sample(dist=example_data[,c(1,2)],n=200,skew = 0,kurts = 0,
save.output=FALSE) # Histogram of the reference data set
# descriptive statistics of the given data,reference data, and drawn sample
output1$desc
# First 6 rows of the drawn sample
head(output1$sample)
# Histogram of the given data set and drawn sample
output1$graph
## Not run: 
# Draw a sample based on Score_2 (from negatively skewed to positively skewed)
# draw_sample(dist=example_data[,c(1,3)],n=200,skew = 1,kurts = 1,
# output_name = c("sample", "1"))
# Draw a sample based on Score_2 (from negatively skewed to positively skewed
# with replacement)
# draw_sample(dist=example_data[,c(1,3)],n=200,skew = 0.5,kurts = 0.4,
# replacement=TRUE,output_name = c("sample", "2"))

## End(Not run)

Sample data with individual responses

Description

A Function to sample data close to desired characteristics with individual responses.

Usage

draw_sample_ir(
  dist,
  n,
  skew,
  kurts,
  replacement = FALSE,
  col_id = 1,
  col_total = numeric(),
  save.output = FALSE,
  output_name = c("sample", "1")
)

Arguments

dist

data frame:consists of id and scores with no missing

n

numeric: desired sample size

skew

numeric: the skewness value

kurts

numeric: the kurtosis value

replacement

logical:Sample with or without replacement? (default is FALSE).

col_id

index of column ID's

col_total

index of column total score

save.output

logical: should the output be saved into a text file? (Default is FALSE).

output_name

character: a vector of two components. The first component is the name of the output file, user can change the second component.

Details

The execution of the function may take some time since it tries to obtain the specified value for skewness and kurtosis.

Value

This function returns a list including following:

  • a matrix: Descriptive statistics of the given data, the reference vector and the sample.

  • a data frame: The id's and individual response of the sample.

  • graph: Histograms for the “data” and the “sample”

References

Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi:10.1007/BF02293811.

Fialkowski, A. C. (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple #' Variable Types. R package version 0.2.2. Retrieved from https://cran.r-project.org/web/packages/SimMultiCorrData/index.html

Atalay Kabasakal, K. & Gunduz, T. (2020). Drawing a Sample with Desired Properties from Population in R Package “drawsample”.Journal of Measurement and Evaluation in Education and Psychology,11(4),405-429. doi:10.21031/epod.790449

Examples

## Not run: 
# Example data provided with package
data(likert_example)
# First 6 rows of the example_data
head(likert_example)
# Draw a sample based on total(from flattened to normal)
output3 <- draw_sample_ir(dist=likert_example,n=200,skew = 1,kurts = 1.2,
col_id=1,col_total=7,save.output = FALSE) # Histogram of the reference data set
# descriptive statistics of the given data,reference data, and drawn sample
output3$desc
# First 6 rows of the drawn sample
head(output3$sample)
# Histogram of the given data set and drawn sample
output3$graph
# Draw a sample based on total(from flattened to normal)
draw_sample_ir(dist=likert_example,n=200,skew = 0.5,kurts =0.5,
col_id=1,col_total=7,save.output = TRUE,
output_name = c("sample", "3"))

## End(Not run)

Sample data close to desired characteristics - nearest

Description

A Function to sample data close to desired characteristics - nearest

Usage

draw_sample_n(
  dist,
  n,
  skew,
  kurts,
  location = 0,
  delta_var = 0,
  save.output = FALSE,
  output_name = c("sample", "default")
)

Arguments

dist

data frame:consists of id and scores with no missing

n

numeric: desired sample size

skew

numeric: the skewness value

kurts

numeric: the kurtosis value

location

numeric: the value for adjusting mean (default is 0).

delta_var

numeric: the value for adjusting variance (default is 0).

save.output

logical: should the output be saved into a text file? (Default is FALSE).

output_name

character: a vector of two components. The first component is the name of the output file, user can change the second component.

Details

The desired skewness and kurtosis values cannot be met while the function execution is faster. The attributes of kurtosis are in doubt. This is because the range of kurtosis is greater than the skewness. For location values can be entered to position the midpoint or mean of the distribution differently. For delta_var the value can be entered for how much will increase or decrease the variability of reference distribution. In other words, the reference distribution is generated as the standard normal distribution, unless the user changes the default values of the location and delta_var arguments.

Value

This function returns a list including following:

  • a matrix: Descriptive statistics of the given data, the reference vector and the sample.

  • a data frame: The id's and scores of the sample

  • graph: Histograms for the “data” and the “sample”

References

Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi:10.1007/BF02293811.

Fialkowski, A. C. (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple #' Variable Types. R package version 0.2.2. Retrieved from https://cran.r-project.org/web/packages/SimMultiCorrData/index.html

Examples

# Example data provided with package
data(example_data)
# Draw a sample based on Score_1
output2 <- draw_sample_n(dist=example_data[,c(1,2)],n=200,skew = 0,
kurts = 0, location=0, delta_var=0,save.output=FALSE) # Histogram of the reference data set
# descriptive statistics of the given data,reference data, and drawn sample
output2$desc
# First 6 rows of the drawn sample
head(output2$sample)
# Histogram of the given data set and drawn sample
output2$graph
## Not run: 
# Draw a sample based on Score_2 (location par)
# draw_sample_n(dist=example_data[,c(1,3)],n=200,skew = 1,kurts = 1,location=-0.5,delta_var=0,
# save.output=TRUE, output_name = c("sample", "2"))
# Draw a sample based on Score_2 (delta_var par)
# draw_sample_n(dist=example_data[,c(1,3)],n=200,skew = 0.5,kurts = 0.4,location=0,delta_var=0.3,
# save.output=TRUE, output_name = c("sample", "3"))

## End(Not run)

Sample data close to desired characteristics with individual responses - nearest

Description

A function to sample data with desired properties.

Usage

draw_sample_n_ir(
  dist,
  n,
  skew,
  kurts,
  location = 0,
  delta_var = 0,
  col_id = 1,
  col_total = numeric(),
  save.output = FALSE,
  output_name = c("sample", "default")
)

Arguments

dist

data frame:consists of id and scores with no missing

n

numeric: desired sample size

skew

numeric: the skewness value

kurts

numeric: the kurtosis value

location

numeric: the value for adjusting mean (default is 0).

delta_var

numeric: the value for adjusting variance (default is 0).

col_id

index of column ID's

col_total

index of column total score

save.output

logical: should the output be saved into a text file? (Default is FALSE).

output_name

character: a vector of two components. The first component is the name of the output file, user can change the second component.

Details

The desired skewness and kurtosis values cannot be met while the function execution is faster. The attributes of kurtosis are in doubt. This is because the range of kurtosis is greater than the skewness.

Value

This function returns a list including following:

  • a matrix: Descriptive statistics of the given data, the reference vector and the sample.

  • a data frame: The id's and scores of the sample

  • graph: Histograms for the “data” and the “sample”

References

Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi:10.1007/BF02293811.

Fialkowski, A. C. (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple #' Variable Types. R package version 0.2.2. Retrieved from https://cran.r-project.org/web/packages/SimMultiCorrData/index.html

Atalay Kabasakal, K. & Gunduz, T. (2020). Drawing a Sample with Desired Properties from Population in R Package “drawsample”.Journal of Measurement and Evaluation in Education and Psychology,11(4),405-429. doi:10.21031/epod.790449

Examples

# Example data provided with package
data(likert_example)
# First 6 rows of the example_data
head(likert_example)
# Draw a sample based on Score_1(from negatively skewed to normal)
output4 <- draw_sample_n_ir(dist=likert_example,n=200,skew = 0,kurts = 0,
location= 0,delta_var = 0,
col_id=1,col_total=7,save.output=FALSE) # Histogram of the reference data set
# descriptive statistics of the given data,reference data, and drawn sample
output4$desc
# First 6 rows of the drawn sample
head(output4$sample)
# Histogram of the given data set and drawn sample
output4$graph
## Not run: 
output4 <- draw_sample_n_ir(dist=likert_example,n=200,skew = 0.5,kurts = 0.5,
location= 0,delta_var = 0,
col_id=1,col_total=7,save.output=TRUE,
output_name = c("sample", "1")) 

## End(Not run)

Multiple Sample Selection

Description

Multiple Sample Selection

Usage

draw_sample_rep(
  dist,
  n,
  rep = 1,
  skew,
  kurts,
  replacement = TRUE,
  col_id = 1,
  col_total = numeric(),
  exact = FALSE
)

Arguments

dist

data frame:consists of id and scores with no missing

n

numeric: desired sample size

rep

numeric: replication

skew

numeric: the skewness value

kurts

numeric: the kurtosis value

replacement

logical:Sample with or without replacement? (default is FALSE).

col_id

index of column ID's

col_total

index of column total score

exact

default is FALSE conduct draw_sample_n_ir function, it is faster and nearest version of draw_sample_ir function.

Value

This function returns a list including following:

  • a matrix: Descriptive statistics of the given data, the reference vector and the sample.

  • a data frame: The id's and scores of the sample

  • graph: Histograms for the “data” and the “sample”

Examples

# Example data provided with package
data(likert_example)
# First 6 rows of the example_data
head(likert_example)
# Draw three samples based on Score_1(from negatively skewed to normal)
# This example takes considerable computation time.
samples <- draw_sample_rep(dist=likert_example,n=200,rep=3,skew=0,
kurts=0,replacement =TRUE,  col_id = 1,
col_total = numeric(),
exact = FALSE)
# to get first sample
samples$sample[[1]]
# to get second sample
samples$sample[[2]]
## Not run: 
# to export 10 samples
for(i in 1:3){
 write.csv(samples$sample[[i]],row.names = FALSE,paste("sample_",i,".csv",sep=""))
 }

## End(Not run)

Draw Samples with a Shiny Applications

Description

Performing package functions with user friendly 'shiny' interface.

Usage

draw_sample_shiny()

Examples

## Not run: 
# if(interactive()){
## Run this code for launching the 'shiny' application
#  draw_sample_shiny()
#  }
# 
## End(Not run)

Example Data

Description

The example data set is made of 500 subjects ids and total scores from two different tests.

Usage

data(example_data)

Format

A data.frame with 3 columns, which are

ID

students' id

Score_1

Scores of test 1

Score_2

Scores of test 2

Examples

# First 6 rows of the example_data
data(example_data)
head(example_data)

Likert Example Data

Description

The example data set is made of 6669 subjects, 7 variables

Usage

data(likert_example)

Format

A data.frame with 7 columns, which are

CNTSTUID

country ID

ST160Q01IA

response of item_1

ST160Q02IA

response of item_2

ST160Q03IA

response of item_3

ST160Q04IA

response of item_4

ST160Q05IA

response of item_5

total

total_score of five items

Examples

# First 6 rows of the likert_example
data(likert_example)
head(likert_example)