Title: | Data Research, Access, Governance Network : Statistical Disclosure Control |
---|---|
Description: | A tool for checking how much information is disclosed when reporting summary statistics. |
Authors: | Ben Derrick |
Maintainer: | Ben Derrick <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-11-22 06:27:32 UTC |
Source: | CRAN |
Disguises the sample mean and standard deviation via a choice of methods.
disguise(usersample, method = 2)
disguise(usersample, method = 2)
usersample |
A vector of all individual sample values. |
method |
Approach for disguising mean and standard deviation. (default = 1) |
*Method 1*
Randomly split the sample into two (approx. equal size) samples A, and B. For sample A calculate and report mean. For sample B calculate and standard deviation.
*Method 2* (default)
Take a sample of size N with replacement; calculate and report mean. Repeat to calculate and report standard deviation.
*Method 3*
Generate a random number (RN1) between N/2 and N. Sample with replacement a sample size of RN1; calculate and report mean. Generate a random number (RN2) between N/2 and N. Sample with replacement a sample size of RN2; calculate and report standard deviation.
*Method 4*
As Method 3, but sampling without replacement.
Outputs disguised mean and disguised standard deviation.
Derrick, B., Green, L., Kember, K., Ritchie, F. & White P, 2022, Safety in numbers: Minimum thresholding, Maximum bounds, and Little White Lies. Scottish Economic Society Annual Conference, University of Glasgow, 25th-27th April 2022
usersample<-c(1,1,2,3,4,4,5) disguise(usersample,method=1) disguise(usersample,method=2) disguise(usersample,method=3) disguise(usersample,method=4)
usersample<-c(1,1,2,3,4,4,5) disguise(usersample,method=1) disguise(usersample,method=2) disguise(usersample,method=3) disguise(usersample,method=4)
A tool for checking how much information is disclosed when reporting summary statistics
For integer based scales, finds possible solutions for each value within a sample. This is revealed upon providing sample size, minimum possible value, maximum possible value, mean, standard deviation (and optionally median).
solutions( n, min_poss, max_poss, usermean, usersd, meandp = NULL, sddp = NULL, usermed = NULL )
solutions( n, min_poss, max_poss, usermean, usersd, meandp = NULL, sddp = NULL, usermed = NULL )
n |
Sample size. |
min_poss |
Minimum possible value. If sample minimum is disclosed, this can be inserted here, otherwise use the theoretical minimum. If there is no theoretical maximum 'Inf' can be inserted. |
max_poss |
Maximum possible value. If sample maximum is disclosed, this can be inserted here, otherwise use the theoretical maximum. If there is no theoretical minimum '-Inf' can be inserted. |
usermean |
Sample mean. |
usersd |
Sample standard deviation, i.e. n-1 denominator. |
meandp |
(optional, default=NULL) Number of decimal places mean is reported to, only required if including trailing zeroes. |
sddp |
(optional, default=NULL) Number of decimal places standard deviation is reported to, only required if including trailing zeroes. |
usermed |
(optional, default=NULL) Sample median. |
For use with data measured on a scale with 1 unit increments. Samuelson's inequality [1] used to further restrict the minimum and maximum. All possible combinations within this inequality are calculated [2] for factorial(n+k-1)/(factorial(k)*factorial(n-1))<65,000,000.
No restriction on number of decimal places input. Reporting less than two decimal places will reduce the chances of unique solution to all sample values being uncovered [3]
Additional options to specify number of digits following the decimal place that are reported, required for trailing zeroes.
Outputs possible combinations of original integer sample values.
[1] Samuelson, P.A, 1968, How deviant can you be? Journal of the American Statistical Association, Vol 63, 1522-1525.
[2] Allenby, R.B. and Slomson, A., 2010. How to count: An introduction to combinatorics. Chapman and Hall/CRC.
[3] Derrick, B., Green, L., Kember, K., Ritchie, F. & White P, 2022, Safety in numbers: Minimum thresholding, Maximum bounds, and Little White Lies. Scottish Economic Society Annual Conference, University of Glasgow, 25th-27th April 2022
# EXAMPLE 1 # Seven observations are taken from a five-point Likert scale (coded 1 to 5). # The reported mean is 2.857 and the reported standard deviation is 1.574. solutions(7,1,5,2.857,1.574) # For this mean and standard deviation there are two possible distributions: # 1 1 2 3 4 4 5 # 1 2 2 2 3 5 5 # Optionally adding median value of 3. solutions(7,1,5,2.857,1.574, usermed=3) # uniquely reveals the raw sample values: # 1 1 2 3 4 4 5 # EXAMPLE 2 # The mean is '4.00'. # The standard deviation is '2.00'. # Narrower set of solutions found specifying 2dp including trailing zeroes. solutions(3,-Inf,Inf,4.00,2.00,2,2) # uniquely reveals the raw sample values: # 2 4 6
# EXAMPLE 1 # Seven observations are taken from a five-point Likert scale (coded 1 to 5). # The reported mean is 2.857 and the reported standard deviation is 1.574. solutions(7,1,5,2.857,1.574) # For this mean and standard deviation there are two possible distributions: # 1 1 2 3 4 4 5 # 1 2 2 2 3 5 5 # Optionally adding median value of 3. solutions(7,1,5,2.857,1.574, usermed=3) # uniquely reveals the raw sample values: # 1 1 2 3 4 4 5 # EXAMPLE 2 # The mean is '4.00'. # The standard deviation is '2.00'. # Narrower set of solutions found specifying 2dp including trailing zeroes. solutions(3,-Inf,Inf,4.00,2.00,2,2) # uniquely reveals the raw sample values: # 2 4 6