Package 'Kernelheaping' reference manual

Title:	Kernel Density Estimation for Heaped and Rounded Data
Description:	In self-reported or anonymised data the user often encounters heaped data, i.e. data which are rounded (to a possibly different degree of coarseness). While this is mostly a minor problem in parametric density estimation the bias can be very large for non-parametric methods such as kernel density estimation. This package implements a partly Bayesian algorithm treating the true unknown values as additional parameters and estimates the rounding parameters to give a corrected kernel density estimate. It supports various standard bandwidth selection methods. Varying rounding probabilities (depending on the true value) and asymmetric rounding is estimable as well: Gross, M. and Rendtel, U. (2016) (<doi:10.1093/jssam/smw011>). Additionally, bivariate non-parametric density estimation for rounded data, Gross, M. et al. (2016) (<doi:10.1111/rssa.12179>), as well as data aggregated on areas is supported.
Authors:	Marcus Gross [aut, cre], Lukas Fuchs [aut], Kerstin Erfurth [ctb]
Maintainer:	Marcus Gross <[email protected]>
License:	GPL-2 \| GPL-3
Version:	2.3.0
Built:	2025-01-26 06:50:42 UTC
Source:	CRAN

Create heaped data for Simulation

Description

Create heaped data for Simulation

Usage

createSim.Kernelheaping(
  n,
  distribution,
  rounds,
  thresholds,
  offset = 0,
  downbias = 0.5,
  Beta = 0,
  ...
)
createSim.Kernelheaping(
  n,
  distribution,
  rounds,
  thresholds,
  offset = 0,
  downbias = 0.5,
  Beta = 0,
  ...
)

Arguments

`n`	sample size
`distribution`	name of the distribution where random sampling is available, e.g. "norm"
`rounds`	rounding values
`thresholds`	rounding thresholds (for Beta=0)
`offset`	certain value added to all observed random samples
`downbias`	bias parameter
`Beta`	acceleration paramter
`...`	additional attributes handed over to "rdistribution" (i.e. rnorm, rgamma,..)

Value

List of heaped values, true values and input parameters

Bivariate kernel density estimation for rounded data

Description

Bivariate kernel density estimation for rounded data

Usage

dbivr(
  xrounded,
  roundvalue,
  burnin = 2,
  samples = 5,
  adaptive = FALSE,
  gridsize = 200
)
dbivr(
  xrounded,
  roundvalue,
  burnin = 2,
  samples = 5,
  adaptive = FALSE,
  gridsize = 200
)

Arguments

`xrounded`	rounded values from which to estimate bivariate density, matrix with 2 columns (x,y)
`roundvalue`	rounding value (side length of square in that the true value lies around the rounded one)
`burnin`	burn-in sample size
`samples`	sampling iteration size
`adaptive`	set to TRUE for adaptive bandwidth
`gridsize`	number of evaluation grid points

Value

The function returns a list object with the following objects (besides all input objects):

`Mestimates`	kde object containing the corrected density estimate
`gridx`	Vector Grid on which density is evaluated (x)
`gridy`	Vector Grid on which density is evaluated (y)
`resultDensity`	Array with Estimated Density for each iteration
`resultX`	Matrix of true latent values X estimates
`delaigle`	Matrix of Delaigle estimator estimates

Examples

# Create Mu and Sigma  -----------------------------------------------------------
mu1 <- c(0, 0)
mu2 <- c(5, 3)
mu3 <- c(-4, 1)
Sigma1 <- matrix(c(4, 3, 3, 4), 2, 2)
Sigma2 <- matrix(c(3, 0.5, 0.5, 1), 2, 2)
Sigma3 <- matrix(c(5, 4, 4, 6), 2, 2)
# Mixed Normal Distribution -------------------------------------------------------
mus <- rbind(mu1, mu2, mu3)
Sigmas <- rbind(Sigma1, Sigma2, Sigma3)
props <- c(1/3, 1/3, 1/3)
## Not run: xtrue=rmvnorm.mixt(n=1000, mus=mus, Sigmas=Sigmas, props=props)
roundvalue=2
xrounded=plyr::round_any(xtrue,roundvalue)
est <- dbivr(xrounded,roundvalue=roundvalue,burnin=5,samples=10)

#Plot corrected and Naive distribution
plot(est,trueX=xtrue)
#for comparison: plot true density
 dens=dmvnorm.mixt(x=expand.grid(est$Mestimates$eval.points[[1]],est$Mestimates$eval.points[[2]]),
  mus=mus, Sigmas=Sigmas, props=props)
 dens=matrix(dens,nrow=length(est$gridx),ncol=length(est$gridy))
 contour(dens,x=est$Mestimates$eval.points[[1]],y=est$Mestimates$eval.points[[2]],
    xlim=c(min(est$gridx),max(est$gridx)),ylim=c(min(est$gridy),max(est$gridy)),main="True Density")
## End(Not run)
# Create Mu and Sigma  -----------------------------------------------------------
mu1 <- c(0, 0)
mu2 <- c(5, 3)
mu3 <- c(-4, 1)
Sigma1 <- matrix(c(4, 3, 3, 4), 2, 2)
Sigma2 <- matrix(c(3, 0.5, 0.5, 1), 2, 2)
Sigma3 <- matrix(c(5, 4, 4, 6), 2, 2)
# Mixed Normal Distribution -------------------------------------------------------
mus <- rbind(mu1, mu2, mu3)
Sigmas <- rbind(Sigma1, Sigma2, Sigma3)
props <- c(1/3, 1/3, 1/3)
## Not run: xtrue=rmvnorm.mixt(n=1000, mus=mus, Sigmas=Sigmas, props=props)
roundvalue=2
xrounded=plyr::round_any(xtrue,roundvalue)
est <- dbivr(xrounded,roundvalue=roundvalue,burnin=5,samples=10)

#Plot corrected and Naive distribution
plot(est,trueX=xtrue)
#for comparison: plot true density
 dens=dmvnorm.mixt(x=expand.grid(est$Mestimates$eval.points[[1]],est$Mestimates$eval.points[[2]]),
  mus=mus, Sigmas=Sigmas, props=props)
 dens=matrix(dens,nrow=length(est$gridx),ncol=length(est$gridy))
 contour(dens,x=est$Mestimates$eval.points[[1]],y=est$Mestimates$eval.points[[2]],
    xlim=c(min(est$gridx),max(est$gridx)),ylim=c(min(est$gridy),max(est$gridy)),main="True Density")
## End(Not run)

Kernel density estimation for classified data

Description

Kernel density estimation for classified data

Usage

dclass(
  xclass,
  burnin = 2,
  samples = 5,
  boundary = FALSE,
  bw = "nrd0",
  evalpoints = 200,
  adjust = 1,
  dFunc = NULL
)
dclass(
  xclass,
  burnin = 2,
  samples = 5,
  boundary = FALSE,
  bw = "nrd0",
  evalpoints = 200,
  adjust = 1,
  dFunc = NULL
)

Arguments

`xclass`	classified values; matrix with two columns: lower and upper value
`burnin`	burn-in sample size
`samples`	sampling iteration size
`boundary`	TRUE for positive only data (no positive density for negative values)
`bw`	bandwidth selector method, defaults to "nrd0" see `density` for more options
`evalpoints`	number of evaluation grid points
`adjust`	as in `density`, the user can multiply the bandwidth by a certain factor such that bw=adjust*bw
`dFunc`	character optional density (with "d", "p" and "q" functions) function name for parametric estimation such as "norm" "gamma" or "lnorm"

Value

The function returns a list object with the following objects (besides all input objects):

`Mestimates`	kde object containing the corrected density estimate
`gridx`	Vector Grid on which density is evaluated
`resultDensity`	Matrix with Estimated Density for each iteration
`resultX`	Matrix of true latent values X estimates

Examples

x=rlnorm(500, meanlog = 8, sdlog = 1)
classes <- c(0,500,1000,1500,2000,2500,3000,4000,5000,6000,8000,10000,15000,Inf)
xclass <- cut(x,breaks=classes)
xclass <- cbind(classes[as.numeric(xclass)], classes[as.numeric(xclass) + 1])
densityEst <- dclass(xclass=xclass, burnin=20, samples=50, evalpoints=1000)
plot(densityEst$Mestimates~densityEst$gridx ,lwd=2, type = "l")
x=rlnorm(500, meanlog = 8, sdlog = 1)
classes <- c(0,500,1000,1500,2000,2500,3000,4000,5000,6000,8000,10000,15000,Inf)
xclass <- cut(x,breaks=classes)
xclass <- cbind(classes[as.numeric(xclass)], classes[as.numeric(xclass) + 1])
densityEst <- dclass(xclass=xclass, burnin=20, samples=50, evalpoints=1000)
plot(densityEst$Mestimates~densityEst$gridx ,lwd=2, type = "l")

Kernel density estimation for heaped data

Description

Kernel density estimation for heaped data

Usage

dheaping(
  xheaped,
  rounds,
  burnin = 5,
  samples = 10,
  setBias = FALSE,
  weights = NULL,
  bw = "nrd0",
  boundary = FALSE,
  unequal = FALSE,
  random = FALSE,
  adjust = 1,
  recall = F,
  recallParams = c(1/3, 1/3)
)
dheaping(
  xheaped,
  rounds,
  burnin = 5,
  samples = 10,
  setBias = FALSE,
  weights = NULL,
  bw = "nrd0",
  boundary = FALSE,
  unequal = FALSE,
  random = FALSE,
  adjust = 1,
  recall = F,
  recallParams = c(1/3, 1/3)
)

Arguments

`xheaped`	heaped values from which to estimate density of x
`rounds`	rounding values, numeric vector of length >=1
`burnin`	burn-in sample size
`samples`	sampling iteration size
`setBias`	if TRUE a rounding Bias parameter is estimated. For values above 0.5, the respondents are more prone to round down, while for values < 0.5 they are more likely to round up
`weights`	optional numeric vector of sampling weights
`bw`	bandwidth selector method, defaults to "nrd0" see `density` for more options
`boundary`	TRUE for positive only data (no positive density for negative values)
`unequal`	if TRUE a probit model is fitted for the rounding probabilities with log(true value) as regressor
`random`	if TRUE a random effect probit model is fitted for rounding probabilities
`adjust`	as in `density`, the user can multiply the bandwidth by a certain factor such that bw=adjust*bw
`recall`	if TRUE a recall error is introduced to the heaping model
`recallParams`	recall error model parameters expression(nu) and expression(eta). Default is c(1/3, 1/3)

Value

The function returns a list object with the following objects (besides all input objects):

`meanPostDensity`	Vector of Mean Posterior Density
`gridx`	Vector Grid on which density is evaluated
`resultDensity`	Matrix with Estimated Density for each iteration
`resultRR`	Matrix with rounding probability threshold values for each iteration (on probit scale)
`resultBias`	Vector with estimated Bias parameter for each iteration
`resultBeta`	Vector with estimated Beta parameter for each iteration
`resultX`	Matrix of true latent values X estimates

Examples

#Simple Rounding   ----------------------------------------------------------
xtrue=rnorm(3000)
xrounded=round(xtrue)
est <- dheaping(xrounded,rounds=1,burnin=20,samples=50)
plot(est,trueX=xtrue)

#####################
#####Heaping
#####################
#Real Data Example  ----------------------------------------------------------
# Student learning hours per week
data(students)
xheaped <- as.numeric(na.omit(students$StudyHrs))
## Not run: est <- dheaping(xheaped,rounds=c(1,2,5,10), boundary=TRUE, unequal=TRUE,burnin=20,samples=50)
plot(est)
summary(est)
## End(Not run)

#Simulate Data   ----------------------------------------------------------
Sim1 <- createSim.Kernelheaping(n=500, distribution="norm",rounds=c(1,10,100),
thresholds=c(-0.5244005, 0.5244005), sd=100)
## Not run: est <- dheaping(Sim1$xheaped,rounds=Sim1$rounds)
plot(est,trueX=Sim1$x)
## End(Not run)

#Biased rounding
Sim2 <- createSim.Kernelheaping(n=500, distribution="gamma",rounds=c(1,2,5,10),
                     thresholds=c(-1.2815516, -0.6744898, 0.3853205),downbias=0.2,
                     shape=4,scale=8,offset=45)
## Not run: est <- dheaping(Sim2$xheaped, rounds=Sim2$rounds, setBias=T, bw="SJ")
plot(est, trueX=Sim2$x)
summary(est)
tracePlots(est)
## End(Not run)

Sim3 <- createSim.Kernelheaping(n=500, distribution="gamma",rounds=c(1,2,5,10),
thresholds=c(1.84, 2.64, 3.05), downbias=0.75, Beta=-0.5, shape=4, scale=8)
## Not run: est <- dheaping(Sim3$xheaped,rounds=Sim3$rounds,boundary=TRUE,unequal=TRUE,setBias=T)
plot(est,trueX=Sim3$x)
## End(Not run)
#Simple Rounding   ----------------------------------------------------------
xtrue=rnorm(3000)
xrounded=round(xtrue)
est <- dheaping(xrounded,rounds=1,burnin=20,samples=50)
plot(est,trueX=xtrue)

#####################
#####Heaping
#####################
#Real Data Example  ----------------------------------------------------------
# Student learning hours per week
data(students)
xheaped <- as.numeric(na.omit(students$StudyHrs))
## Not run: est <- dheaping(xheaped,rounds=c(1,2,5,10), boundary=TRUE, unequal=TRUE,burnin=20,samples=50)
plot(est)
summary(est)
## End(Not run)

#Simulate Data   ----------------------------------------------------------
Sim1 <- createSim.Kernelheaping(n=500, distribution="norm",rounds=c(1,10,100),
thresholds=c(-0.5244005, 0.5244005), sd=100)
## Not run: est <- dheaping(Sim1$xheaped,rounds=Sim1$rounds)
plot(est,trueX=Sim1$x)
## End(Not run)

#Biased rounding
Sim2 <- createSim.Kernelheaping(n=500, distribution="gamma",rounds=c(1,2,5,10),
                     thresholds=c(-1.2815516, -0.6744898, 0.3853205),downbias=0.2,
                     shape=4,scale=8,offset=45)
## Not run: est <- dheaping(Sim2$xheaped, rounds=Sim2$rounds, setBias=T, bw="SJ")
plot(est, trueX=Sim2$x)
summary(est)
tracePlots(est)
## End(Not run)

Sim3 <- createSim.Kernelheaping(n=500, distribution="gamma",rounds=c(1,2,5,10),
thresholds=c(1.84, 2.64, 3.05), downbias=0.75, Beta=-0.5, shape=4, scale=8)
## Not run: est <- dheaping(Sim3$xheaped,rounds=Sim3$rounds,boundary=TRUE,unequal=TRUE,setBias=T)
plot(est,trueX=Sim3$x)
## End(Not run)

3d Kernel density estimation for data classified in polygons or shapes

Description

3d Kernel density estimation for data classified in polygons or shapes

Usage

dshape3dProp(
  data,
  burnin = 2,
  samples = 5,
  shapefile,
  gridsize = 200,
  boundary = FALSE,
  deleteShapes = NULL,
  fastWeights = TRUE,
  numChains = 1,
  numThreads = 1
)
dshape3dProp(
  data,
  burnin = 2,
  samples = 5,
  shapefile,
  gridsize = 200,
  boundary = FALSE,
  deleteShapes = NULL,
  fastWeights = TRUE,
  numChains = 1,
  numThreads = 1
)

Arguments

`data`	data.frame with 5 columns: x-coordinate, y-coordinate (i.e. center of polygon) and number of observations in area for partial population and number of observations for complete observations and third variable (numeric).
`burnin`	burn-in sample size
`samples`	sampling iteration size
`shapefile`	shapefile with number of polygons equal to nrow(data) / length(unique(data[,5]))
`gridsize`	number of evaluation grid points
`boundary`	boundary corrected kernel density estimate?
`deleteShapes`	shapefile containing areas without observations
`fastWeights`	if TRUE weigths for boundary estimation are only computed for first 10 percent of samples to speed up computation
`numChains`	number of chains of SEM algorithm
`numThreads`	number of threads to be used (only applicable if more than one chains)

Bivariate Kernel density estimation for data classified in polygons or shapes

Description

Bivariate Kernel density estimation for data classified in polygons or shapes

Usage

dshapebivr(
  data,
  burnin = 2,
  samples = 5,
  adaptive = FALSE,
  shapefile,
  gridsize = 200,
  boundary = FALSE,
  deleteShapes = NULL,
  fastWeights = TRUE,
  numChains = 1,
  numThreads = 1
)
dshapebivr(
  data,
  burnin = 2,
  samples = 5,
  adaptive = FALSE,
  shapefile,
  gridsize = 200,
  boundary = FALSE,
  deleteShapes = NULL,
  fastWeights = TRUE,
  numChains = 1,
  numThreads = 1
)

Arguments

`data`	data.frame with 3 columns: x-coordinate, y-coordinate (i.e. center of polygon) and number of observations in area.
`burnin`	burn-in sample size
`samples`	sampling iteration size
`adaptive`	TRUE for adaptive kernel density estimation
`shapefile`	shapefile with number of polygons equal to nrow(data)
`gridsize`	number of evaluation grid points
`boundary`	boundary corrected kernel density estimate?
`deleteShapes`	shapefile containing areas without observations
`fastWeights`	if TRUE weigths for boundary estimation are only computed for first 10 percent of samples to speed up computation
`numChains`	number of chains of SEM algorithm
`numThreads`	number of threads to be used (only applicable if more than one chains)

Value

The function returns a list object with the following objects (besides all input objects):

`Mestimates`	kde object containing the corrected density estimate
`gridx`	Vector Grid of x-coordinates on which density is evaluated
`gridy`	Vector Grid of y-coordinates on which density is evaluated
`resultDensity`	Matrix with Estimated Density for each iteration
`resultX`	Matrix of true latent values X estimates

Examples

## Not run: 
library(maptools)

# Read Shapefile of Berlin Urban Planning Areas (download available from:
# https://www.statistik-berlin-brandenburg.de/opendata/RBS_OD_LOR_2015_12.zip)
Berlin <- rgdal::readOGR("X:/SomeDir/RBS_OD_LOR_2015_12.shp") #(von daten.berlin.de)

# Get Dataset of Berlin Population (download available from:
# https://www.statistik-berlin-brandenburg.de/opendata/EWR201512E_Matrix.csv)
data <- read.csv2("X:/SomeDir/EWR201512E_Matrix.csv")

# Form Dataset for Estimation Process
dataIn <- cbind(t(sapply(1:length(Berlin@polygons),
 function(x) Berlin@polygons[[x]]@labpt)), data$E_E65U80)

#Estimate Bivariate Density
Est <- dshapebivr(data = dataIn, burnin = 5, samples = 10, adaptive = FALSE,
                 shapefile = Berlin, gridsize = 325, boundary = TRUE)
## End(Not run)

# Plot Density over Area:
## Not run: breaks <- seq(1E-16,max(Est$Mestimates$estimate),length.out = 20)
image.plot(x=Est$Mestimates$eval.points[[1]],y=Est$Mestimates$eval.points[[2]],
          z=Est$Mestimates$estimate, asp=1, breaks = breaks,
          col =  colorRampPalette(brewer.pal(9,"YlOrRd"))(length(breaks)-1))
plot(Berlin, add=TRUE)
## End(Not run)
## Not run: 
library(maptools)

# Read Shapefile of Berlin Urban Planning Areas (download available from:
# https://www.statistik-berlin-brandenburg.de/opendata/RBS_OD_LOR_2015_12.zip)
Berlin <- rgdal::readOGR("X:/SomeDir/RBS_OD_LOR_2015_12.shp") #(von daten.berlin.de)

# Get Dataset of Berlin Population (download available from:
# https://www.statistik-berlin-brandenburg.de/opendata/EWR201512E_Matrix.csv)
data <- read.csv2("X:/SomeDir/EWR201512E_Matrix.csv")

# Form Dataset for Estimation Process
dataIn <- cbind(t(sapply(1:length(Berlin@polygons),
 function(x) Berlin@polygons[[x]]@labpt)), data$E_E65U80)

#Estimate Bivariate Density
Est <- dshapebivr(data = dataIn, burnin = 5, samples = 10, adaptive = FALSE,
                 shapefile = Berlin, gridsize = 325, boundary = TRUE)
## End(Not run)

# Plot Density over Area:
## Not run: breaks <- seq(1E-16,max(Est$Mestimates$estimate),length.out = 20)
image.plot(x=Est$Mestimates$eval.points[[1]],y=Est$Mestimates$eval.points[[2]],
          z=Est$Mestimates$estimate, asp=1, breaks = breaks,
          col =  colorRampPalette(brewer.pal(9,"YlOrRd"))(length(breaks)-1))
plot(Berlin, add=TRUE)
## End(Not run)

Bivariate Kernel density estimation for data classified in polygons or shapes

Description

Bivariate Kernel density estimation for data classified in polygons or shapes

Usage

dshapebivrProp(
  data,
  burnin = 2,
  samples = 5,
  adaptive = FALSE,
  shapefile,
  gridsize = 200,
  boundary = FALSE,
  deleteShapes = NULL,
  fastWeights = TRUE,
  numChains = 1,
  numThreads = 1
)
dshapebivrProp(
  data,
  burnin = 2,
  samples = 5,
  adaptive = FALSE,
  shapefile,
  gridsize = 200,
  boundary = FALSE,
  deleteShapes = NULL,
  fastWeights = TRUE,
  numChains = 1,
  numThreads = 1
)

Arguments

`data`	data.frame with 4 columns: x-coordinate, y-coordinate (i.e. center of polygon) and number of observations in area for partial population and number of observations for complete observations.
`burnin`	burn-in sample size
`samples`	sampling iteration size
`adaptive`	TRUE for adaptive kernel density estimation
`shapefile`	shapefile with number of polygons equal to nrow(data)
`gridsize`	number of evaluation grid points
`boundary`	boundary corrected kernel density estimate?
`deleteShapes`	shapefile containing areas without observations
`fastWeights`	if TRUE weigths for boundary estimation are only computed for first 10 percent of samples to speed up computation
`numChains`	number of chains of SEM algorithm
`numThreads`	number of threads to be used (only applicable if more than one chains)

Examples

## Not run: 
library(maptools)

# Read Shapefile of Berlin Urban Planning Areas (download available from:
  https://www.statistik-berlin-brandenburg.de/opendata/RBS_OD_LOR_2015_12.zip)
Berlin <- rgdal::readOGR("X:/SomeDir/RBS_OD_LOR_2015_12.shp") #(von daten.berlin.de)

# Get Dataset of Berlin Population (download available from:
# https://www.statistik-berlin-brandenburg.de/opendata/EWR201512E_Matrix.csv)
data <- read.csv2("X:/SomeDir/EWR201512E_Matrix.csv")

# Form Dataset for Estimation Process
dataIn <- cbind(t(sapply(1:length(Berlin@polygons),
function(x) Berlin@polygons[[x]]@labpt)), data$E_E65U80, data$E_E)

#Estimate Bivariate Proportions (may take some minutes)
PropEst <- dshapebivrProp(data = dataIn, burnin = 5, samples = 20, adaptive = FALSE,
shapefile = Berlin, gridsize=325, numChains = 16, numThreads = 4)
## End(Not run)

# Plot Proportions over Area:
## Not run: 
breaks <- seq(0,0.4,by=0.025)
image.plot(x=PropEst$Mestimates$eval.points[[1]],y=PropEst$Mestimates$eval.points[[2]],
          z=PropEst$proportion+1E-96, asp=1, breaks = breaks,
          col =  colorRampPalette(brewer.pal(9,"YlOrRd"))(length(breaks)-1))
plot(Berlin, add=TRUE)
## End(Not run)
## Not run: 
library(maptools)

# Read Shapefile of Berlin Urban Planning Areas (download available from:
  https://www.statistik-berlin-brandenburg.de/opendata/RBS_OD_LOR_2015_12.zip)
Berlin <- rgdal::readOGR("X:/SomeDir/RBS_OD_LOR_2015_12.shp") #(von daten.berlin.de)

# Get Dataset of Berlin Population (download available from:
# https://www.statistik-berlin-brandenburg.de/opendata/EWR201512E_Matrix.csv)
data <- read.csv2("X:/SomeDir/EWR201512E_Matrix.csv")

# Form Dataset for Estimation Process
dataIn <- cbind(t(sapply(1:length(Berlin@polygons),
function(x) Berlin@polygons[[x]]@labpt)), data$E_E65U80, data$E_E)

#Estimate Bivariate Proportions (may take some minutes)
PropEst <- dshapebivrProp(data = dataIn, burnin = 5, samples = 20, adaptive = FALSE,
shapefile = Berlin, gridsize=325, numChains = 16, numThreads = 4)
## End(Not run)

# Plot Proportions over Area:
## Not run: 
breaks <- seq(0,0.4,by=0.025)
image.plot(x=PropEst$Mestimates$eval.points[[1]],y=PropEst$Mestimates$eval.points[[2]],
          z=PropEst$proportion+1E-96, asp=1, breaks = breaks,
          col =  colorRampPalette(brewer.pal(9,"YlOrRd"))(length(breaks)-1))
plot(Berlin, add=TRUE)
## End(Not run)

Kernel Density Estimation for Heaped Data

Description

In self-reported or anonymized data the user often encounters heaped data, i.e. data which are rounded (to a possibly different degree of coarseness). While this is mostly a minor problem in parametric density estimation the bias can be very large for non-parametric methods such as kernel density estimation. This package implements a partly Bayesian algorithm treating the true unknown values as additional parameters and estimates the rounding parameters to give a corrected kernel density estimate. It supports various standard bandwidth selection methods. Varying rounding probabilities (depending on the true value) and asymmetric rounding is estimable as well. Additionally, bivariate non-parametric density estimation for rounded data is supported.

Details

The most important function is dheaping. See the help and the attached examples on how to use the package.

Plot Kernel density estimate of heaped data naively and corrected by partly bayesian model

Description

Plot Kernel density estimate of heaped data naively and corrected by partly bayesian model

Usage

## S3 method for class 'bivrounding'
plot(x, trueX = NULL, ...)
## S3 method for class 'bivrounding'
plot(x, trueX = NULL, ...)

Arguments

`x`	bivrounding object produced by `dbivr` function
`trueX`	optional, if true values X are known (in simulations, for example) the 'Oracle' density estimate is added as well
`...`	additional arguments given to standard plot function

Value

plot with Kernel density estimates (Naive, Corrected and True (if provided))

Plot Kernel density estimate of heaped data naively and corrected by partly bayesian model

Description

Plot Kernel density estimate of heaped data naively and corrected by partly bayesian model

Usage

## S3 method for class 'Kernelheaping'
plot(x, trueX = NULL, ...)
## S3 method for class 'Kernelheaping'
plot(x, trueX = NULL, ...)

Arguments

`x`	Kernelheaping object produced by `dheaping` function
`trueX`	optional, if true values X are known (in simulations, for example) the 'Oracle' density estimate is added as well
`...`	additional arguments given to standard plot function

Value

plot with Kernel density estimates (Naive, Corrected and True (if provided))

Simulation of heaping correction method

Description

Simulation of heaping correction method

Usage

sim.Kernelheaping(
  simRuns,
  n,
  distribution,
  rounds,
  thresholds,
  downbias = 0.5,
  setBias = FALSE,
  Beta = 0,
  unequal = FALSE,
  burnin = 5,
  samples = 10,
  bw = "nrd0",
  offset = 0,
  boundary = FALSE,
  adjust = 1,
  ...
)
sim.Kernelheaping(
  simRuns,
  n,
  distribution,
  rounds,
  thresholds,
  downbias = 0.5,
  setBias = FALSE,
  Beta = 0,
  unequal = FALSE,
  burnin = 5,
  samples = 10,
  bw = "nrd0",
  offset = 0,
  boundary = FALSE,
  adjust = 1,
  ...
)

Arguments

`simRuns`	number of simulations runs
`n`	sample size
`distribution`	name of the distribution where random sampling is available, e.g. "norm"
`rounds`	rounding values, numeric vector of length >=1
`thresholds`	rounding thresholds
`downbias`	Bias parameter used in the simulation
`setBias`	if TRUE a rounding Bias parameter is estimated. For values above 0.5, the respondents are more prone to round down, while for values < 0.5 they are more likely to round up
`Beta`	Parameter of the probit model for rounding probabilities used in simulation
`unequal`	if TRUE a probit model is fitted for the rounding probabilities with log(true value) as regressor
`burnin`	burn-in sample size
`samples`	sampling iteration size
`bw`	bandwidth selector method, defaults to "nrd0" see `density` for more options
`offset`	location shift parameter used simulation in simulation
`boundary`	TRUE for positive only data (no positive density for negative values)
`adjust`	as in `density`, the user can multiply the bandwidth by a certain factor such that bw=adjust*bw
`...`	additional attributes handed over to `createSim.Kernelheaping`

Value

List of estimation results

Examples

## Not run: Sims1 <- sim.Kernelheaping(simRuns=2, n=500, distribution="norm",
rounds=c(1,10,100), thresholds=c(0.3,0.4,0.3), sd=100)
## End(Not run)
## Not run: Sims1 <- sim.Kernelheaping(simRuns=2, n=500, distribution="norm",
rounds=c(1,10,100), thresholds=c(0.3,0.4,0.3), sd=100)
## End(Not run)

Simulation Summary

Description

Simulation Summary

Usage

simSummary.Kernelheaping(sim, coverage = 0.9)
simSummary.Kernelheaping(sim, coverage = 0.9)

Arguments

`sim`	Simulation object returned from sim.Kernelheaping
`coverage`	probability for computing coverage intervals

Value

list with summary statistics

Student0405

Description

Data collected during 2004 and 2005 from students in statistics classes at a large state university in the northeastern United States.

Source

http://mathfaculty.fullerton.edu/mori/Math120/Data/readme

References

Utts, J. M., & Heckard, R. F. (2011). Mind on statistics. Cengage Learning.

Prints some descriptive statistics (means and quantiles) for the estimated rounding, bias and acceleration (beta) parameters

Description

Prints some descriptive statistics (means and quantiles) for the estimated rounding, bias and acceleration (beta) parameters

Usage

## S3 method for class 'Kernelheaping'
summary(object, ...)
## S3 method for class 'Kernelheaping'
summary(object, ...)

Arguments

`object`	Kernelheaping object produced by `dheaping` function
`...`	unused

Value

Prints summary statistics

Transfer observations to other shape

Description

Transfer observations to other shape

Usage

toOtherShape(Mestimates, shapefile)
toOtherShape(Mestimates, shapefile)

Arguments

`Mestimates`	Estimation object created by functions dshapebivr and dbivr
`shapefile`	The new shapefile for which the observations shall be transferred to

Value

The function returns the count, sd and 90

Plots some trace plots for the rounding, bias and acceleration (beta) parameters

Description

Plots some trace plots for the rounding, bias and acceleration (beta) parameters

Usage

tracePlots(x, ...)
tracePlots(x, ...)

Arguments

`x`	Kernelheaping object produced by `dheaping` function
`...`	additional arguments given to standard plot function

Value

Prints summary statistics

Package 'Kernelheaping'

Help Index

Create heaped data for Simulation

Description

Usage

Arguments

Value

Bivariate kernel density estimation for rounded data

Description

Usage

Arguments

Value

Examples

Kernel density estimation for classified data

Description

Usage

Arguments

Value

Examples

Kernel density estimation for heaped data

Description

Usage

Arguments

Value

Examples

3d Kernel density estimation for data classified in polygons or shapes

Description

Usage

Arguments

Bivariate Kernel density estimation for data classified in polygons or shapes

Description

Usage

Arguments

Value

Examples

Bivariate Kernel density estimation for data classified in polygons or shapes

Description

Usage

Arguments

Examples

Kernel Density Estimation for Heaped Data

Description

Details

Plot Kernel density estimate of heaped data naively and corrected by partly bayesian model

Description

Usage

Arguments

Value

Plot Kernel density estimate of heaped data naively and corrected by partly bayesian model

Description

Usage

Arguments

Value

Simulation of heaping correction method

Description

Usage

Arguments

Value

Examples

Simulation Summary

Description

Usage

Arguments

Value

Student0405

Description

Source

References

Prints some descriptive statistics (means and quantiles) for the estimated rounding, bias and acceleration (beta) parameters

Description

Usage

Arguments

Value

Transfer observations to other shape

Description

Usage

Arguments

Value

Plots some trace plots for the rounding, bias and acceleration (beta) parameters

Description