The CASMAP
package provides methods for searching for
combinatorial associations inbinary data while taking categorical
covariates into account. There are two main modes: the methods either
search for region-based mappings or for higher order epistatic
interactions.
CASMAP
objectsTo create a CASMAP
object, it is necessary to specify
the mode. The first example below creates an object that will perform a
region-based GWAS search, and then sets the target family-wise error
rate to 0.01
.
library(CASMAP)
# An example using the "regionGWAS" mode
fastcmh <- CASMAP(mode="regionGWAS") # initialise object
fastcmh$setTargetFWER(0.01) # set target FWER
The next example shows how to create an object that will search for
arbitrary combinations, i.e. a higher order epistatic search. Note that
it is also possible to set the target family-wise error rate when
constructing the object by setting alpha
.
By printing the object, one can see certain information. The field Maximum combination size = 0 indicates that combinations of all possible length will be considered. In future versions, it will be possible to limit this number, for example to combinations of maxmimum length 4.
library(CASMAP)
# Another example, doing higher order epistasis search with target FWER 0.01
facs <- CASMAP(mode="higherOrderEpistasis", alpha=0.01)
print(facs)
## CASMAP object with:
## * Mode = higherOrderEpistasis
## * Target FWER = 0.01
## * Maximum combination size = 0
## * No input files read
Once the object is created, the next step is to read in the data
files. The readLines
command is used, and paths to the data
files should be specified for the parameters genotype_file
,
phenotype_file
and (optionally)
covariate_file
. We have provided example data files with
the package, as well as functions to easily get the paths to these data
files:
library(CASMAP)
fastcmh <- CASMAP(mode="regionGWAS") # initialise object
datafile <- getExampleDataFilename() # file name of example data
labelsfile <- getExampleLabelsFilename() # file name of example labels
covfile <- getExampleCovariatesFilename() # file name of example covariates
# read the data, labels and (optionally) covariate files
fastcmh$readFiles(genotype_file=getExampleDataFilename(),
phenotype_file=getExampleLabelsFilename(),
covariate_file=getExampleCovariatesFilename())
#The object now displays that data files have been read, and covariates are used
print(fastcmh)
## CASMAP object with:
## * Mode = regionGWAS
## * Target FWER = 0.05
## * Maximum combination size = 0
## * Input files read
## * Covariate = TRUE
Note that the CASMAP
methods expect the data file to be
a text file consisting of space-separated 0
s and
1
s, in an p × n matrix, where each of
the p rows is a feature, and
each of the n columns is a
sample/subject. The labels and covariates files are single columns of
n entries, where each entry is
0
or 1
. To see an example of the data format,
take a look at the included example files, the paths to which are given
by the commands getExampleDataFilename
,
getExampleLabelsFilename
and
getExampleCovariatesFilename
:
#to see where these data files are located on your local drive:
print(getExampleDataFilename())
## Example:
## [1] "/path/to/pkgs/CASMAP/extdata/CASMAP_example_data_1.txt"
In future versions the PLINK data format will be supported.
Once you have read in the data, label and covariates files, you are
ready to execute the algorithm. Simply use the execute
command. Note that, depending on the size of your data set, this could
take some time.
There are two main sets of results:
The summary results provide information on how many regions/interactions were processed, how many are testable, and what are the significance and testable thresholds:
## $n.int.processed
## [1] 18193
##
## $n.int.testable
## [1] 16426
##
## $testability.threshold
## [1] 2.630268e-06
##
## $target.fwer
## [1] 0.05
##
## $corrected.significance.threshold
## [1] 3.043955e-06
It is also possible to write this information to file directly using
the writeSummary
command.
The significant regions lists all the regions that are considered
significant. However, it is possible that these regions overlap into
clusters. The most significant regions in these clusters can be
extracted using the getSignificantClusterRepresentatives
command. In the example below, there is only one significant regions, so
it is its own cluster representative:
## start end score odds_ratio pvalue
## 1 99 102 24.56281 16.12821 7.192676e-07
#get the clustered representatives for the significant regions
sig_cluster_rep <- fastcmh$getSignificantClusterRepresentatives()
print(sig_cluster_rep)
## start end score odds_ratio pvalue
## 1 99 102 24.56281 16.12821 7.192676e-07
Note that the p-value and odds ratio for the regions/representatives is provided along with the location.
For the higherOrderEpistasis
mode, the method
getSignificantInteractions
should be used (and there are no
cluster representatives).
It is also possible to perform a search without any covariates:
## Another example of regionGWAS
fais <- CASMAP(mode="regionGWAS") # initialise object
# read the data and labels, but no covariates
fais$readFiles(genotype_file=getExampleDataFilename(),
phenotype_file=getExampleLabelsFilename())
print(fais)
## CASMAP object with:
## * Mode = regionGWAS
## * Target FWER = 0.05
## * Maximum combination size = 0
## * Input files read
## * Covariate = FALSE
The binary data could be encoded with either a dominant or recessive
encoding. The default for readLines
is
dominant
, but it is also possible to specify the coding
explicitly:
library(CASMAP)
fastcmh <- CASMAP(mode="regionGWAS")
# using the dominant encoding (default)
fastcmh$readFiles(genotype_file=getExampleDataFilename(),
phenotype_file=getExampleLabelsFilename(),
covariate_file=getExampleCovariatesFilename(),
encoding="dominant")
# using the dominant encoding (default)
fastcmh$readFiles(genotype_file=getExampleDataFilename(),
phenotype_file=getExampleLabelsFilename(),
covariate_file=getExampleCovariatesFilename(),
encoding="recessive")
Note that future versions of the package will include the option to read PLINK files, and the option to set the maximum combination length.