Simulation experiments with SaTScan and rsatscan

In this vignette, I use the space-time permutation scan to show how the ‘rsatscan’ package can be used to simplify the process of making data in R, running ‘SaTScan’ on the generated data, and collecting the results, presumably leading to quicker and easier accumulation of results.

I begin by making data on a 10*10 grid of locations, over 30 days. Each day, each location has a 0.1 probability of having a single case.

set.seed(42)
mygeo = expand.grid(1:10,1:10)
daysbase = 30
locid = rep(1:100, times=daysbase)
basecas = rbinom(3000, 1, .1)
day = rep(1:30, each = 100)
mycas = data.frame(locid,basecas, day)

Here’s what the geo and case files look like. I’m using generic time, for convenience.

head(mygeo)
##   Var1 Var2
## 1    1    1
## 2    2    1
## 3    3    1
## 4    4    1
## 5    5    1
## 6    6    1
head(mycas)
##   locid basecas day
## 1     1       1   1
## 2     2       1   1
## 3     3       0   1
## 4     4       0   1
## 5     5       0   1
## 6     6       0   1

Now I can write the data into the OS; the row names in the mygeo data.frame object are the location IDs for ‘SaTSCan’, so I’m using the userownames option to use, rather than ignore, the row names from R in the geography file; in the case file, there is an explicit column with the same information included.

library("rsatscan")
td = tempdir()
write.geo(mygeo, location = td, file = "mygeo", userownames=TRUE)
write.cas(mycas, location = td, file = "mycas")

Now I’m ready to build the parameter file. This is adapted pretty closely from the NYCfever example in the rsatscan vignette.

invisible(ss.options(reset=TRUE))
ss.options(list(CaseFile="mycas.cas", PrecisionCaseTimes=4))
ss.options(list(StartDate="1", CoordinatesType=0, TimeAggregationUnits=4))
ss.options(list(EndDate="30", CoordinatesFile="mygeo.geo", AnalysisType=4, ModelType=2)) 
ss.options(list(UseDistanceFromCenterOption="y", MaxSpatialSizeInDistanceFromCenter=3)) 
ss.options(list(NonCompactnessPenalty=0, MaxTemporalSizeInterpretation=1, MaxTemporalSize=7))
ss.options(list(ProspectiveStartDate="30", ReportGiniClusters="n", LogRunToHistoryFile="n"))
ss.options(list(SaveSimLLRsDBase="y"))

Then I write the parameter file into the OS and run ‘SaTScan’ using it. I’ll peek in the summary cluster table to see what we got. Note that the location and batch file name of your ‘SaTScan’ installation may vary on your machine.

write.ss.prm(td, "mybase")
# This step omitted in compliance with CRAN policies
# Please install 'SaTScan' and run the vignette with this and following code uncommented
# 'SaTScan' can be downloaded from www.satscan.org, free of charge
# you will also find there fully compiled versions of this vignette with results

# mybase = satscan(td, "mybase", sslocation="C:/Program Files/SaTScan", ssbatchfilename="SaTScanBatch64")
# mybase$col[3:10]

As one would hope, there’s no evidence of a meaningful cluster.

Now, let’s add a day just like the others. I’ll stick it onto the end of the previous data, then write out a new case file.

newday = data.frame(locid = 1:100, basecas = rbinom(100,1,.1), day = 31)
newcas = rbind(mycas,newday)
write.cas(newcas, location = td, file = "mycas")

I don’t need to re-assign any parameter values that don’t change between runs. In this case, since I used the same name for the data file, I only need to change the end date of the surveillance period.

ss.options(list(EndDate="31"))
write.ss.prm(td, "day1")

# day1 = satscan(td, "day1", sslocation="C:/Program Files/SaTScan", ssbatchfilename="SaTScanBatch64")
# day1$col[3:10]

Again, no clusters, as we would expect.

But now let’s make a cluster appear. I create an additional time unit as before, but then select a location to get a heap of extra cases. Glue the new day to the end of the old case file, write it to the OS, change the end date, and re-run ‘SaTScan’.

newday = data.frame(locid = 1:100, basecas = rbinom(100,1,.1), day = 32)
newday$basecas[20] =5
newcas = rbind(mycas,newday)

write.cas(newcas, location = td, file = "mycas")

ss.options(list(EndDate="32"))
write.ss.prm(td, "day2")

# day2 = satscan(td,"day2", sslocation="C:/Program Files/SaTScan", ssbatchfilename="SaTScanBatch64")
# day2$col[3:10]

This demonstrates that I did detect what I inserted. I can also extract the wordier section of the report about this cluster.

# summary(day2)
# cat(day2$main[20:31],fill=1)