Package 'varitas' reference manual

Title:	Variant Calling in Targeted Analysis Sequencing Data
Description:	Multi-caller variant analysis pipeline for targeted analysis sequencing (TAS) data. Features a modular, automated workflow that can start with raw reads and produces a user-friendly PDF summary and a spreadsheet containing consensus variant information.
Authors:	Adam Mills [aut, cre], Erle Holgersen [aut], Ros Cutts [aut], Syed Haider [aut]
Maintainer:	Adam Mills <Adam.Mills@icr.ac.uk>
License:	GPL-2
Version:	0.0.2
Built:	2025-03-12 06:54:58 UTC
Source:	CRAN

alternate.gene.sort

Description

Given a data frame containing coverage statistics and gene information, returns that frame with the rows sorted by alternating gene size (for plotting)

Usage

alternate.gene.sort(coverage.statistics)
alternate.gene.sort(coverage.statistics)

Arguments

coverage.statistics

Data frame of coverage statistics

Details

Genes have varying numbers of associated amplicons and when plotting coverage statistics, if two genes with very low numbers of amplicons are next to each other, the labels will overlap. This function sorts the coverage statistics data frame in a way that places the genes with the most amplicons (largest) next to those with the least (smallest).

Value

Coverage statistics data frame sorted by alternating gene size

build.variant.specification

Description

Build data frame with paths to variant files.

Usage

build.variant.specification(sample.ids, project.directory)
build.variant.specification(sample.ids, project.directory)

Arguments

`sample.ids`	Vector of sample IDs. Must match subdirectories in project.directory.
`project.directory`	Path to directory where sample subdirectories

Details

Parses through sample IDs in a project directory and returns paths to variant files based on (theoretical) file name patterns. Useful for testing, or for entering the pipeline at non-traditional stages.

Value

Data frame with paths to variant files.

. Make Venn diagram of variant caller overlap

Description

. Make Venn diagram of variant caller overlap

Usage

caller.overlap.venn.diagram(variants, file.name)
caller.overlap.venn.diagram(variants, file.name)

Arguments

`variants`	Data frame containing variants, typically from merge.variants function
`file.name`	Name of output file

capitalize.caller

Description

Capitalize variant caller name

Usage

capitalize.caller(caller)

capitalise.caller(caller)
capitalize.caller(caller)

capitalise.caller(caller)

Arguments

caller

Character vector of callers to be capitalized

Value

Vector of same length as caller where eligible callers have been capitalized

classify.variant

Description

Classify a variant as SNV, MNV, or indel based on the reference and alternative alleles

Usage

classify.variant(ref, alt)
classify.variant(ref, alt)

Arguments

`ref`	Vector of reference bases
`alt`	Vector of alternate bases

Value

Character vector giving type of variant.

Convert output of iDES step 1 to variant call format

Description

Convert output of iDES step 1 to variant call format

Usage

convert.ides.output(filename, output = TRUE,
  output.suffix = ".calls.txt", minreads = 5, mindepth = 50)
convert.ides.output(filename, output = TRUE,
  output.suffix = ".calls.txt", minreads = 5, mindepth = 50)

Arguments

`filename`	Path to file
`output`	Logical indicating whether output should be saved to file. Defaults to true.
`output.suffix`	Suffix to be appended to input filename if saving results to file
`minreads`	Minimum numbers of reads
`mindepth`	Minimum depth

Value

potential.calls Data frame of converted iDES calls

create.directories

Description

Create directories in a given path

Usage

create.directories(directory.names, path)
create.directories(directory.names, path)

Arguments

`directory.names`	Vector of names of directories to be created
`path`	Path where directories should be created

date.stamp.file.name

Description

Prefix file name with a date-stamp.

Usage

date.stamp.file.name(file.name, date = Sys.Date(), separator = "_")
date.stamp.file.name(file.name, date = Sys.Date(), separator = "_")

Arguments

`file.name`	File name to be date-stamped
`date`	Date to be added. Defaults to current date.
`separator`	String that should separate the date from the file name. Defaults to a single underscore.

Value

String giving the datestamped file name

Examples

date.stamp.file.name('plot.png');
date.stamp.file.name('yesterdays_plot.png', date = Sys.Date() - 1);

date.stamp.file.name('plot.png');
date.stamp.file.name('yesterdays_plot.png', date = Sys.Date() - 1);

Extract sample IDs from file paths

Description

Extract sample IDs from a set of paths to files in sample-specific subfolders

Usage

extract.sample.ids(paths, from.filename = FALSE)
extract.sample.ids(paths, from.filename = FALSE)

Arguments

`paths`	vector of file paths
`from.filename`	Logical indicating whether sample ID should be extracted from filename rather than path

Value

vector of extracted sample IDs

Filter variants in file.

Description

Filter variants from file, and save to output. Wrapper function that opens the variant file, calls filter.variants, and saves the result to file

Usage

filter.variant.file(variant.file, output.file, config.file = NULL,
  caller = c("vardict", "ides", "mutect", "pgm", "consensus"))
filter.variant.file(variant.file, output.file, config.file = NULL,
  caller = c("vardict", "ides", "mutect", "pgm", "consensus"))

Arguments

`variant.file`	Path to variant file
`output.file`	Path to output file
`config.file`	Path to config file to be used. If not supplied, will use the pre-existing VariTAS options.
`caller`	Name of caller used (needed to match appropriate filters from settings)

Value

None

Filter variant calls

Description

Filter data frame of variant calls based on thresholds specified in settings.

Usage

filter.variants(variants, caller = c("vardict", "ides", "mutect", "pgm",
  "consensus", "isis", "varscan", "lofreq"), config.file = NULL,
  verbose = FALSE)
filter.variants(variants, caller = c("vardict", "ides", "mutect", "pgm",
  "consensus", "isis", "varscan", "lofreq"), config.file = NULL,
  verbose = FALSE)

Arguments

`variants`	Data frame of variant calls with ANNOVAR annotation, or path to variant file.
`caller`	Name of caller used (needed to match appropriate filters from settings)
`config.file`	Path to config file to be used. If not supplied, will use the pre-existing VariTAS options.
`verbose`	Logical indicating whether to output descriptions of filtering steps. Defaults to False, useful for debugging.

Value

filtered.variants Data frame of filtered variants

fix.lofreq.af

Description

LoFreq also does not output allele frequencies, so this script calculates them from the DP (depth) and AD (variant allele depth) values–which are also not output nicely– and adds them to the annotated vcf.

Usage

fix.lofreq.af(variant.specification)
fix.lofreq.af(variant.specification)

Arguments

variant.specification

Data frame of variant file information

Fix variant call column names

Description

Fix headers of variant calls to prepare for merging. This mostly consists in making sure the column headers will be unique by prefixing the variant caller in question.

Usage

fix.names(column.names, variant.caller, sample.id = NULL)
fix.names(column.names, variant.caller, sample.id = NULL)

Arguments

`column.names`	Character vector of column names
`variant.caller`	String giving name of variant caller
`sample.id`	Optional sample ID. Used to fix headers.

Value

new.column.names Vector of column names after fixing]

fix.varscan.af

Description

VarScan does not output allele frequencies, so this script calculates them from the DP (depth) and AD (variant allele depth) values and adds them to the annotated vcf.

Usage

fix.varscan.af(variant.specification)
fix.varscan.af(variant.specification)

Arguments

variant.specification

Data frame of variant file information

Get base substitution

Description

Get base substitution represented by pyrimidine in base pair. If more than one base in REF/ALT (i.e. MNV or indel rather than SNV), NA will be returned

Usage

get.base.substitution(ref, alt)
get.base.substitution(ref, alt)

Arguments

`ref`	Vector of reference bases
`alt`	Vector of alternate bases

Value

base.substitutions

get.bed.chromosomes

Description

Extract chromosomes from bed file

Usage

get.bed.chromosomes(bed)
get.bed.chromosomes(bed)

Arguments

bed

Path to BED file

Value

Vector containing all chromosomes in BED file

get.buildver

Description

Get build version (hg19/hg38) based on settings.

Parses VariTAS pipeline settings to get the build version. When this function was first developed, the idea was to be able to explicitly set ANNOVAR filenames based on the build version.

Usage

get.buildver()
get.buildver()

Value

String giving reference genome build version (hg19 or hg38)

Generate a colour scheme

Description

Generate a colour scheme

Usage

get.colours(n)
get.colours(n)

Arguments

`n`	Number of colours desired

Value

Colour.scheme generated colours

Process sample coverage per amplicon data

Description

Parse coverageBed output to get coverage by amplicon

Usage

get.coverage.by.amplicon(project.directory)
get.coverage.by.amplicon(project.directory)

Arguments

project.directory

Path to project directory. Each sample should have its own subdirectory

Value

combined.data Data frame giving coverage per amplicon per sample.

References

http://bedtools.readthedocs.io/en/latest/content/tools/coverage.html

Get statistics about coverage per sample

Description

Get statistics about coverage per sample

Usage

get.coverage.by.sample.statistics(project.directory)
get.coverage.by.sample.statistics(project.directory)

Arguments

project.directory

Path to project directory. Each sample should have its own subdirectory

Value

coverage.by.sample.statistics Data frame with coverage statistics per sample

get.fasta.chromosomes

Description

Extract chromosomes from fasta headers.

Usage

get.fasta.chromosomes(fasta)
get.fasta.chromosomes(fasta)

Arguments

fasta

Path to reference fasta

Value

Vector containing all chromosomes in fasta file.

get.file.path

Description

Get absolute path to sample-specific file for one or more samples

Usage

get.file.path(sample.ids, directory, extension = NULL,
  allow.multiple = FALSE, allow.none = FALSE)
get.file.path(sample.ids, directory, extension = NULL,
  allow.multiple = FALSE, allow.none = FALSE)

Arguments

`sample.ids`	Vector of sample IDs to match filename on
`directory`	Path to directory containing files
`extension`	String giving extension of file
`allow.multiple`	Boolean indicating whether to allow multiple matching files. Defaults to false, which throws an error if the query matches more than one file.
`allow.none`	Boolean indicating whether to allow no matching files. Defaults to false, which throws an error if the query does not match any files.

Value

Paths to matched files

get.filters

Description

Determine filters per caller, given default and caller-specific values.

Usage

get.filters(filters)
get.filters(filters)

Arguments

filters

List of filter values. These will be updated to use default as the baseline, with caller-specific filters taking precedence if supplied.

Value

A list with updated filters

get.gene

Description

Use guesswork to extract gene from data frame of targeted panel data. The panel designer output can change, so try to guess what the format is.

Usage

get.gene(bed.data)
get.gene(bed.data)

Arguments

bed.data

Data frame containing data from bed file

Value

vector of gene names, one entry for each row of bed.data

get.miniseq.sample.files

Description

Get files for a sample in a directory, ensuring there's only a single match per sample ID.

Usage

get.miniseq.sample.files(sample.ids, directory,
  file.suffix = "_S\\d{1,2}_.*")
get.miniseq.sample.files(sample.ids, directory,
  file.suffix = "_S\\d{1,2}_.*")

Arguments

`sample.ids`	Vector of sample ids. Should form first part of file name
`directory`	Directory where files can be found
`file.suffix`	Regex expression for end of file name. For example, ‘file.suffix = ’_S\d1,2_._R1_.'' will match R1 files.1 files.

Value

Character vector of file paths

Helper function to recursively get an VariTAS option

Description

Helper function to recursively get an VariTAS option

Usage

get.option(name, varitas.options = NULL, nesting.character = "\\.")
get.option(name, varitas.options = NULL, nesting.character = "\\.")

Arguments

`name`	Option name
`varitas.options`	Optional list of options to search in
`nesting.character`	String giving Regex pattern of nesting indication string. Defaults to '\.'

Value

value Requested option

Summarise panel coverage by gene

Description

Summarise panel coverage by gene

Usage

get.panel.coverage.by.gene(panel.file, gene.col = 5)
get.panel.coverage.by.gene(panel.file, gene.col = 5)

Arguments

`panel.file`	path to panel
`gene.col`	index of column containing gene name

Value

panel.coverage.by.gene data frame giving the number of amplicons and their total length by gene

Get pool corresponding to each amplicon

Description

The bed files are not consistent, so it's not clear where the pool will appear. This function parses through the columns to identify where the pool

Usage

get.pool.from.panel.data(panel.data)
get.pool.from.panel.data(panel.data)

Arguments

panel.data

data frame pool should be extracted from

Value

pools vector of pool information

Return VariTAS settings

Description

Return VariTAS settings

Usage

get.varitas.options(option.name = NULL, nesting.character = "\\.")
get.varitas.options(option.name = NULL, nesting.character = "\\.")

Arguments

`option.name`	Optional name of option. If no name is supplied, the full list of VariTAS options will be provided.
`nesting.character`	String giving Regex pattern of nesting indication string. Defaults to '\.'

Value

varitas.options list specifying VariTAS options

Examples

reference.build <- get.varitas.options('reference_build');
mutect.filters <- get.varitas.options('filters.mutect');

reference.build <- get.varitas.options('reference_build');
mutect.filters <- get.varitas.options('filters.mutect');

get.vcf.chromosomes

Description

Extract chromosomes from a VCF file.

Usage

get.vcf.chromosomes(vcf)
get.vcf.chromosomes(vcf)

Arguments

vcf

Path to VCF file

Value

Vector containing all chromosomes in VCF

Check if a key is in VariTAS options

Description

Check if a key is in VariTAS options

Usage

in.varitas.options(option.name = NULL, varitas.options = NULL,
  nesting.character = "\\.")
in.varitas.options(option.name = NULL, varitas.options = NULL,
  nesting.character = "\\.")

Arguments

`option.name`	String giving name of option (with different levels joined by `nesting.character`)
`varitas.options`	Ampliseq options as a list. If missing, they will be obtained from `get.varitas.options()`
`nesting.character`	String giving Regex pattern of nesting indication string. Defaults to '\.'

Value

in.options Boolean indicating if the option name exists in the current varitas options

logical.to.character

Description

Convert a logical vector to a T/F coded character vector. Useful for preventing unwanted T->TRUE nucleotide conversions

Usage

logical.to.character(x)
logical.to.character(x)

Arguments

`x`	Vector to be converted

Value

Character vector after converting TRUE/FALSE

Make string with command line call from its individual components

Description

Make string with command line call from its individual components

Usage

make.command.line.call(main.command, options = NULL, flags = NULL,
  option.prefix = "--", option.separator = " ", flag.prefix = "--")
make.command.line.call(main.command, options = NULL, flags = NULL,
  option.prefix = "--", option.separator = " ", flag.prefix = "--")

Arguments

`main.command`	String or vector of strings giving main part of command (e.g. "python test.py" or c("python", "test.py"))
`options`	Named vector or list giving options
`flags`	Vector giving flags to include.
`option.prefix`	String to preface all options. Defaults to "–"
`option.separator`	String to separate options form their values. Defaults to a single space.
`flag.prefix`	String to preface all flags. Defaults to "–"

Value

command string giving command line call

mean.field.value

Description

Get mean value of a variant annotation field

Usage

## S3 method for class 'field.value'
mean(variants, field = c("TUMOUR.DP", "NORMAL.DP",
  "NORMAL.AF", "TUMOUR.AF", "QUAL"), caller = c("consensus", "vardict",
  "pgm", "mutect", "isis", "varscan", "lofreq"))
## S3 method for class 'field.value'
mean(variants, field = c("TUMOUR.DP", "NORMAL.DP",
  "NORMAL.AF", "TUMOUR.AF", "QUAL"), caller = c("consensus", "vardict",
  "pgm", "mutect", "isis", "varscan", "lofreq"))

Arguments

`variants`	Data frame with variants
`field`	String giving field of interest.
`caller`	String giving caller to calculate values from

Details

As part of the variant merging process, annotated variant data frames are merged into one, with the value from each caller prefixed by CALLER. For example, the VarDict normal allele freqeuncy will have header VARDICT.NORMAL.AF. This function takes the average of all callers' value for a given field, removing NA's. If only a single caller is present in the data frame, that value is returned.

Value

Vector of mean values.

Merge potential iDES calls with variant annotation.

Description

Merge potential iDES calls with variant annotation.

Usage

## S3 method for class 'ides.annotation'
merge(ides.filename, output = TRUE,
  output.suffix = ".ann.txt",
  annovar.suffix.pattern = ".annovar.hg(\\d{2})_multianno.txt")
## S3 method for class 'ides.annotation'
merge(ides.filename, output = TRUE,
  output.suffix = ".ann.txt",
  annovar.suffix.pattern = ".annovar.hg(\\d{2})_multianno.txt")

Arguments

`ides.filename`	Path to formatted iDES output (typically from convert.ides.output file)
`output`	Logical indicating whether output should be saved to file. Defaults to true.
`output.suffix`	Suffix to be appended to input filename if saving results to file
`annovar.suffix.pattern`	Suffix to match ANNOAR file

Details

The VarDict variant calling includes a GATK call merging the call vcf file (allele frequency information etc.) with the ANNOVAR annotation, and saving the result as a table. This function is an attempt to emulate that step for the iDES calls.

Value

annotated.calls Data frame of annotations and iDES output.

Merge variants

Description

Merge variants from multiple callers and return a data frame of merged calls. By default filtering is also applied, although this behaviour can be turned off by setting apply.filters to FALSE.

Usage

## S3 method for class 'variants'
merge(variant.specification, apply.filters = TRUE,
  remove.structural.variants = TRUE,
  separate.consensus.filters = FALSE, verbose = FALSE)
## S3 method for class 'variants'
merge(variant.specification, apply.filters = TRUE,
  remove.structural.variants = TRUE,
  separate.consensus.filters = FALSE, verbose = FALSE)

Arguments

`variant.specification`	Data frame containing details of file paths, sample IDs, and caller.
`apply.filters`	Logical indicating whether to apply filters. Defaults to TRUE.
`remove.structural.variants`	Logical indicating whether structural variants (including CNVs) should be removed. Defaults to TRUE.
`separate.consensus.filters`	Logical indicating whether to apply different thresholds to variants called by more than one caller (specified under consensus in config file). Defaults to FALSE.
`verbose`	Logical indicating whether to print information to screen

Value

Data frame

overwrite.varitas.options

Description

Overwrite VariTAS options with options provided in config file.

Usage

overwrite.varitas.options(config.file)
overwrite.varitas.options(config.file)

Arguments

config.file

Path to config file that should be used to overwrite options

Value

None

Examples

## Not run: 
config <- file.path(path.package('varitas'), 'config.yaml')
overwrite.varitas.options(config)

## End(Not run)

## Not run: 
config <- file.path(path.package('varitas'), 'config.yaml')
overwrite.varitas.options(config)

## End(Not run)

Parse job dependencies

Description

Parse job dependencies to make the functions more robust to alternate inputs (e.g. people writing alignment instead of bwa)

Usage

parse.job.dependencies(dependencies)
parse.job.dependencies(dependencies)

Arguments

dependencies

Job dependency strings to be parsed.

Value

parsed.dependencies Vector of job dependencies after reformatting.

plot.amplicon.coverage.per.sample

Description

Create one scatterplot per sample, showing coverage per amplicon, and an additional plot giving the median

Usage

## S3 method for class 'amplicon.coverage.per.sample'
plot(coverage.statistics,
  output.directory)
## S3 method for class 'amplicon.coverage.per.sample'
plot(coverage.statistics,
  output.directory)

Arguments

`coverage.statistics`	Data frame containing coverage per amplicon per sample, typically from `get.coverage.by.amplicon`.
`output.directory`	Directory where per sample plots should be saved

Value

None

Plot amplicon coverage by genome order

Description

Use values obtained by bedtools coverage to make a plot of coverage by genome order

Usage

## S3 method for class 'coverage.by.genome.order'
plot(coverage.data)
## S3 method for class 'coverage.by.genome.order'
plot(coverage.data)

Arguments

coverage.data

data frame with results from bedtools coverage command

plot.coverage.by.sample

Description

Make a barplot of coverage per sample

Usage

## S3 method for class 'coverage.by.sample'
plot(coverage.sample, file.name,
  statistic = c("mean", "median"))
## S3 method for class 'coverage.by.sample'
plot(coverage.sample, file.name,
  statistic = c("mean", "median"))

Arguments

`coverage.sample`	Data frame of coverage data, typically from `get.coverage.by.sample.statistics`
`file.name`	Name of output file
`statistic`	Statistic to be plotted (mean or median)

Value

None

plot.ontarget.percent

Description

Make a scatterplot of ontarget percent per sample

Usage

## S3 method for class 'ontarget.percent'
plot(coverage.sample, file.name)
## S3 method for class 'ontarget.percent'
plot(coverage.sample, file.name)

Arguments

`coverage.sample`	Data frame of coverage data, typically from `get.coverage.by.sample.statistics`
`file.name`	Name of output file

Value

None

plot.paired.percent

Description

Make a barplot of percent paired reads per sample

Usage

## S3 method for class 'paired.percent'
plot(coverage.sample, file.name)
## S3 method for class 'paired.percent'
plot(coverage.sample, file.name)

Arguments

`coverage.sample`	Data frame of coverage data, typically from `get.coverage.by.sample.statistics`
`file.name`	Name of output file

Value

None

Post-processing of variants to generate outputs

Description

Post-processing of variants to generate outputs

Usage

post.processing(variant.specification, project.directory,
  config.file = NULL, variant.callers = NULL,
  remove.structural.variants = TRUE,
  separate.consensus.filters = FALSE, sleep = FALSE, verbose = FALSE)
post.processing(variant.specification, project.directory,
  config.file = NULL, variant.callers = NULL,
  remove.structural.variants = TRUE,
  separate.consensus.filters = FALSE, sleep = FALSE, verbose = FALSE)

Arguments

`variant.specification`	Data frame specifying variants to be processed, or path to data frame (useful if calling from Perl)
`project.directory`	Directory where output should be stored. Output files will be saved to a datestamped subdirectory
`config.file`	Path to config file specifying post-processing options. If not provided, the current options are used (i.e. from `get.varitas.options()`)
`variant.callers`	Optional vector of variant callers for which filters should be included in Excel file
`remove.structural.variants`	Logical indicating whether structural variants (including CNVs) should be removed. Defaults to TRUE.
`separate.consensus.filters`	Logical indicating whether to apply different thresholds to variants called by more than one caller (specified under consensus in config file). Defaults to FALSE.
`sleep`	Logical indicating whether script should sleep for 60 seconds before starting.
`verbose`	Logical indicating whether to print verbose output

Value

None

Prepare BAM specification data frame to standardized format for downstream analyses.

Description

This function prepares a data frame that can be used to run variant callers. For matched normal variant calling, this data frame will contain three columns with names: sample.id, tumour.bam, normal.bam For unpaired variant calling, the data frame will contain two columns with names: sample.id, tumour.bam

Usage

prepare.bam.specification(sample.details, paired = TRUE,
  sample.id.column = 1, tumour.bam.column = 2, normal.bam.column = 3)
prepare.bam.specification(sample.details, paired = TRUE,
  sample.id.column = 1, tumour.bam.column = 2, normal.bam.column = 3)

Arguments

`sample.details`	Data frame where each row represents a sample to be run. Must contain sample ID, path to tumour BAM, and path to normal BAM.
`paired`	Logical indicating whether the sample specification is for a paired analysis.
`sample.id.column`	Index or string giving column of sample.details that contains the sample ID
`tumour.bam.column`	Index or string giving column of sample.details that contains the path to the tumour BAM
`normal.bam.column`	Index or string giving column of sample.details that contains the path to the normal BAM

Value

bam.specification Data frame with one row per sample to be run

prepare.fastq.specification

Description

Prepare FASTQ specification data frame to standardized format for downstream analyses.

Usage

prepare.fastq.specification(sample.details, sample.id.column = 1,
  fastq.columns = c(2, 3), patient.id.column = NA,
  tissue.column = NA)
prepare.fastq.specification(sample.details, sample.id.column = 1,
  fastq.columns = c(2, 3), patient.id.column = NA,
  tissue.column = NA)

Arguments

`sample.details`	Data frame where each row represents a sample to be run. Must contain sample ID, path to tumour BAM, and path to normal BAM.
`sample.id.column`	Index or string giving column of `sample.details` that contains the sample ID
`fastq.columns`	Index or string giving column(s) of `sample.details` that contain path to FASTQ files
`patient.id.column`	Index or string giving column of `sample.details` that contains the patient ID
`tissue.column`	Index or string giving column of `sample.details` that contains information on tissue (tumour/ normal)

Details

This function prepares a data frame that can be used to run alignment. For paired-end reads, this data frame will contain three columns with names: sample.id, reads, mates For single-end reads, the data frame will contain two columns with names: sample.id, reads

Value

Data frame with one row per sample to be run

prepare.miniseq.specifications

Description

Process a MiniSeq directory and sample sheet to get specification data frames that can be used to run the VariTAS pipeline.

Note: This assumes normal samples are not available.

Usage

prepare.miniseq.specifications(sample.sheet, miniseq.directory)
prepare.miniseq.specifications(sample.sheet, miniseq.directory)

Arguments

`sample.sheet`	Data frame containing sample information, or path to a MiniSeq sample sheet
`miniseq.directory`	Path to directory with MiniSeq files

Value

A list with specification data frames 'fastq', 'bam', and 'vcf' (as applicable)

Examples

miniseq.sheet <- file.path(path.package('varitas'), 'extdata/miniseq/Example_template.csv')
miniseq.directory <- file.path(path.package('varitas'), 'extdata/miniseq')
miniseq.info <- prepare.miniseq.specifications(miniseq.sheet, miniseq.directory)


miniseq.sheet <- file.path(path.package('varitas'), 'extdata/miniseq/Example_template.csv')
miniseq.directory <- file.path(path.package('varitas'), 'extdata/miniseq')
miniseq.info <- prepare.miniseq.specifications(miniseq.sheet, miniseq.directory)

prepare.vcf.specification

Description

Prepare VCF specification data frame for annotation

Usage

prepare.vcf.specification(vcf.details, sample.id.column = 1,
  vcf.column = 2, job.dependency.column = NA, caller.column = NA)
prepare.vcf.specification(vcf.details, sample.id.column = 1,
  vcf.column = 2, job.dependency.column = NA, caller.column = NA)

Arguments

`vcf.details`	Data frame containing details of VCF files
`sample.id.column`	Identifier of column in `vcf.details` containing sample IDs (index or name)
`vcf.column`	Identifier of column in `vcf.details` containing VCF file (index or name)
`job.dependency.column`	Identifier of column in `vcf.details` containing job dependency (index or name)
`caller.column`	Identifier of column in `vcf.details` containing caller (index or name)

Value

Properly formatted VCF details

Process coverageBed reports

Description

Process the coverage reports generated by bedtools coverage tool.

Usage

process.coverage.reports(project.directory)
process.coverage.reports(project.directory)

Arguments

project.directory

Path to project directory. Each sample should have its own subdirectory

Value

final.statistics data frame of coverage statistics generated by parsing through coverage reports

Process sample contamination checks

Description

Takes *selfSM reports generated by VerifyBamID during alignment, and returns a vector of freemix scores. The freemix score is a sequence only estimate of sample contamination that ranges from 0 to 1.

Note: Targeted panels are often too small for this step to work properly.

Usage

process.sample.contamination.checks(project.directory)
process.sample.contamination.checks(project.directory)

Arguments

project.directory

Path to project directory. Each sample should have its own subdirectory

Value

freemix.scores Data frame giving sample contamination (column freemix) score per sample.

References

https://genome.sph.umich.edu/wiki/VerifyBamID

Process total coverage statistics

Description

Process reports generated by flagstat. Assumes reports for before and after off-target filtering have been written to the same file, with separating headers

Usage

process.total.coverage.statistics(project.directory)
process.total.coverage.statistics(project.directory)

Arguments

project.directory

Path to project directory. Each sample should have its own subdirectory

Value

data frame with extracted statistics

read.all.calls

Description

Read all calls made with a certain caller

Usage

read.all.calls(sample.ids, caller = c("vardict", "mutect", "pgm"),
  project.directory, patient.ids = NULL, apply.filters = TRUE,
  variant.file.pattern = NULL)
read.all.calls(sample.ids, caller = c("vardict", "mutect", "pgm"),
  project.directory, patient.ids = NULL, apply.filters = TRUE,
  variant.file.pattern = NULL)

Arguments

`sample.ids`	Vector giving sample IDs to process
`caller`	String indicating which caller was used
`project.directory`	Path to project directory
`patient.ids`	Optional vector giving patient ID (or other group) corresponding to each sample
`apply.filters`	Logical indicating whether filters specified in VariTAS options should be applied. Defaults to TRUE. !
`variant.file.pattern`	Pattern indicating where the variant file can be found. Sample ID should be indicated by SAMPLE_ID

Value

combined.variant.calls Data frame with variant calls from all patients

Read iDES output

Description

Read output from iDES_step1.pl and return data frame

Usage

read.ides.file(filename)
read.ides.file(filename)

Arguments

filename

path to file

Value

ides.data data frame read from iDES output

Read variant calls from file and format for ease of downstream analyses.

Description

Read variant calls from file and format for ease of downstream analyses.

Usage

read.variant.calls(variant.file, variant.caller)
read.variant.calls(variant.file, variant.caller)

Arguments

`variant.file`	Path to variant file.
`variant.caller`	String indicating which variant caller was used. Needed to format the headers.

Value

variant.calls Data frame of variant calls

read.yaml

Description

Read a yaml file

Usage

read.yaml(file.name)
read.yaml(file.name)

Arguments

file.name

Path to yaml file

Value

list containing contents of yaml file

Examples

read.yaml(file.path(path.package('varitas'), 'config.yaml'))

read.yaml(file.path(path.package('varitas'), 'config.yaml'))

Run alignment

Description

Run alignment

Usage

run.alignment(fastq.specification, output.directory, paired.end = FALSE,
  sample.directories = TRUE, output.subdirectory = FALSE,
  job.name.prefix = NULL, job.group = "alignment", quiet = FALSE,
  verify.options = !quiet)
run.alignment(fastq.specification, output.directory, paired.end = FALSE,
  sample.directories = TRUE, output.subdirectory = FALSE,
  job.name.prefix = NULL, job.group = "alignment", quiet = FALSE,
  verify.options = !quiet)

Arguments

`fastq.specification`	Data frame detailing FASTQ files to be processed, typically from prepare.fastq.specification
`output.directory`	Path to project directory
`paired.end`	Logical indicating whether paired-end sequencing was performed
`sample.directories`	Logical indicating whether all sample files should be saved to sample-specific subdirectories (will be created)
`output.subdirectory`	If further nesting is required, name of subdirectory. If no further nesting, set to FALSE
`job.name.prefix`	Prefix for job names on the cluster
`job.group`	Group job should be associated with on cluster
`quiet`	Logical indicating whether to print commands to screen rather than submit them
`verify.options`	Logical indicating whether to run verify.varitas.options

Details

Runs alignment (and related processing steps) on each sample.

Value

None

Examples

run.alignment(
  fastq.specification = data.frame(
    sample.id = c('1', '2'),
    reads = c('1-R1.fastq.gz', '2-R1.fastq.gz'),
    mates = c('1-R2.fastq.gz', '2-R2.fastq.gz'),
    patient.id = c('P1', 'P1'),
    tissue = c('tumour', 'normal')
  ),
  output.directory = '.',
  quiet = TRUE,
  paired.end = TRUE
)

run.alignment(
  fastq.specification = data.frame(
    sample.id = c('1', '2'),
    reads = c('1-R1.fastq.gz', '2-R1.fastq.gz'),
    mates = c('1-R2.fastq.gz', '2-R2.fastq.gz'),
    patient.id = c('P1', 'P1'),
    tissue = c('tumour', 'normal')
  ),
  output.directory = '.',
  quiet = TRUE,
  paired.end = TRUE
)

Run alignment for a single sample

Description

Run alignment for a single sample

Usage

run.alignment.sample(fastq.files, sample.id, output.directory = NULL,
  output.filename = NULL, code.directory = NULL,
  log.directory = NULL, config.file = NULL, job.dependencies = NULL,
  job.name = NULL, job.group = NULL, quiet = FALSE,
  verify.options = !quiet)
run.alignment.sample(fastq.files, sample.id, output.directory = NULL,
  output.filename = NULL, code.directory = NULL,
  log.directory = NULL, config.file = NULL, job.dependencies = NULL,
  job.name = NULL, job.group = NULL, quiet = FALSE,
  verify.options = !quiet)

Arguments

`fastq.files`	Paths to FASTQ files (one file if single-end reads, two files if paired-end)
`sample.id`	Sample ID for labelling
`output.directory`	Path to output directory
`output.filename`	Name of resulting VCF file (defaults to SAMPLE_ID.vcf)
`code.directory`	Path to directory where code should be stored
`log.directory`	Path to directory where log files should be stored
`config.file`	Path to config file
`job.dependencies`	Vector with names of job dependencies
`job.name`	Name of job to be submitted
`job.group`	Group job should belong to
`quiet`	Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging.
`verify.options`	Logical indicating whether to run verify.varitas.options

Run all the generated bash scripts without HPC commands

Description

Run all the scripts generated by previous parts of the pipeline, without using HPC commands

Usage

run.all.scripts(output.directory, stages.to.run = c("alignment", "qc",
  "calling", "annotation", "merging"), variant.callers = NULL,
  quiet = FALSE)
run.all.scripts(output.directory, stages.to.run = c("alignment", "qc",
  "calling", "annotation", "merging"), variant.callers = NULL,
  quiet = FALSE)

Arguments

`output.directory`	Main directory where all files should be saved
`stages.to.run`	A character vector of all stages that need running
`variant.callers`	A character vector of variant callers to run
`quiet`	Logical indicating whether to print commands to screen rather than submit jobs. Defaults to FALSE, can be useful to set to TRUE for testing.

Value

None

Run annotation on a set of VCF files

Description

Takes a data frame with paths to VCF files, and runs ANNOVAR annotation on each file. To allow for smooth connections with downstream pipeline steps, the function returns a variant specification data frame that can be used as input to merging steps.

Usage

run.annotation(vcf.specification, output.directory = NULL,
  job.name.prefix = NULL, job.group = NULL, quiet = FALSE,
  verify.options = !quiet)
run.annotation(vcf.specification, output.directory = NULL,
  job.name.prefix = NULL, job.group = NULL, quiet = FALSE,
  verify.options = !quiet)

Arguments

`vcf.specification`	Data frame detailing VCF files to be processed, from `prepare.vcf.specification`.
`output.directory`	Path to folder where code and log files should be stored in their respective subdirectories. If not supplied, code and log files will be stored in the directory with each VCF file.
`job.name.prefix`	Prefix to be added before VCF name in job name. Defaults to 'annotate', but should be changed if running multiple callers to avoid
`job.group`	Group job should be associated with on cluster
`quiet`	Logical indicating whether to print commands to screen rather than submit them
`verify.options`	Logical indicating whether to run verify.varitas.options

Value

Data frame with details of variant files

Examples

run.annotation(
  data.frame(
    sample.id = c('a', 'b'),
    vcf = c('a.vcf', 'b.vcf'),
    caller = c('mutect', 'mutect')
  ),
  output.directory = '.',
  quiet = TRUE
)

run.annotation(
  data.frame(
    sample.id = c('a', 'b'),
    vcf = c('a.vcf', 'b.vcf'),
    caller = c('mutect', 'mutect')
  ),
  output.directory = '.',
  quiet = TRUE
)

Run ANNOVAR on a VCF file

Description

Run ANNOVAR on a VCF file

Usage

run.annovar.vcf(vcf.file, output.directory = NULL,
  output.filename = NULL, code.directory = NULL,
  log.directory = NULL, config.file = NULL, job.dependencies = NULL,
  job.group = NULL, job.name = NULL, isis = FALSE, quiet = FALSE,
  verify.options = !quiet)
run.annovar.vcf(vcf.file, output.directory = NULL,
  output.filename = NULL, code.directory = NULL,
  log.directory = NULL, config.file = NULL, job.dependencies = NULL,
  job.group = NULL, job.name = NULL, isis = FALSE, quiet = FALSE,
  verify.options = !quiet)

Arguments

`vcf.file`	Path to VCF file
`output.directory`	Path to output directory
`output.filename`	Name of resulting VCF file (defaults to SAMPLE_ID.vcf)
`code.directory`	Path to directory where code should be stored
`log.directory`	Path to directory where log files should be stored
`config.file`	Path to config file
`job.dependencies`	Vector with names of job dependencies
`job.group`	Group job should belong to
`job.name`	Name of job to be submitted
`isis`	Logical indicating whether VCF files are from the isis (MiniSeq) variant caller
`quiet`	Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging.
`verify.options`	Logical indicating whether to run verify.varitas.options

Value

None

Run filtering on an ANNOVAR-annotated txt file

Description

Run filtering on an ANNOVAR-annotated txt file

Usage

run.filtering.txt(variant.file, caller = c("consensus", "vardict",
  "ides", "mutect"), output.directory = NULL, output.filename = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, job.group = NULL, quiet = FALSE)
run.filtering.txt(variant.file, caller = c("consensus", "vardict",
  "ides", "mutect"), output.directory = NULL, output.filename = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, job.group = NULL, quiet = FALSE)

Arguments

`variant.file`	Path to variant file
`caller`	String giving variant caller that was used (affects which filters were applied.
`output.directory`	Path to output directory
`output.filename`	Name of resulting VCF file (defaults to SAMPLE_ID.vcf)
`code.directory`	Path to directory where code should be stored
`log.directory`	Path to directory where log files should be stored
`config.file`	Path to config file
`job.dependencies`	Vector with names of job dependencies
`job.group`	Group job should belong to
`quiet`	Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging.

Run iDES

Description

Run iDES

Usage

run.ides(project.directory, sample.id.pattern = "._S\\d+$",
  sample.ids = NULL, job.dependencies = NULL)
run.ides(project.directory, sample.id.pattern = "._S\\d+$",
  sample.ids = NULL, job.dependencies = NULL)

Arguments

`project.directory`	Directory containing files
`sample.id.pattern`	Regex pattern to match sample IDs
`sample.ids`	Vector of sample IDs
`job.dependencies`	Vector of job dependencies

Details

Run iDES step 1on each sample, to tally up calls by strand. Files are output to a the sample subdirectory

Value

None

Note

Deprecated function for running iDES. Follows previous development package without specification data frames

References

https://cappseq.stanford.edu/ides/

Run LoFreq for a sample

Description

Run LoFreq for a sample

Usage

run.lofreq.sample(tumour.bam, sample.id, paired, normal.bam = NULL,
  output.directory = NULL, output.filename = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, quiet = FALSE, job.name = NULL,
  verify.options = !quiet, job.group = NULL)
run.lofreq.sample(tumour.bam, sample.id, paired, normal.bam = NULL,
  output.directory = NULL, output.filename = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, quiet = FALSE, job.name = NULL,
  verify.options = !quiet, job.group = NULL)

Arguments

`tumour.bam`	Path to tumour sample BAM file.
`sample.id`	Sample ID for labelling
`paired`	Logical indicating whether to do variant calling with a matched normal.
`normal.bam`	Path to normal BAM file if `paired = TRUE`
`output.directory`	Path to output directory
`output.filename`	Name of resulting VCF file (defaults to SAMPLE_ID.vcf)
`code.directory`	Path to directory where code should be stored
`log.directory`	Path to directory where log files should be stored
`config.file`	Path to config file
`job.dependencies`	Vector with names of job dependencies
`quiet`	Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging.
`job.name`	Name of job to be submitted
`verify.options`	Logical indicating whether to run verify.varitas.options
`job.group`	Group job should belong to

Run MuSE for a sample

Description

Run MuSE for a sample

Usage

run.muse.sample(tumour.bam, sample.id, paired, normal.bam = NULL,
  output.directory = NULL, output.filename = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, quiet = FALSE, job.name = NULL,
  verify.options = !quiet, job.group = NULL)
run.muse.sample(tumour.bam, sample.id, paired, normal.bam = NULL,
  output.directory = NULL, output.filename = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, quiet = FALSE, job.name = NULL,
  verify.options = !quiet, job.group = NULL)

Arguments

`tumour.bam`	Path to tumour sample BAM file.
`sample.id`	Sample ID for labelling
`paired`	Logical indicating whether to do variant calling with a matched normal.
`normal.bam`	Path to normal BAM file if `paired = TRUE`
`output.directory`	Path to output directory
`output.filename`	Name of resulting VCF file (defaults to SAMPLE_ID.vcf)
`code.directory`	Path to directory where code should be stored
`log.directory`	Path to directory where log files should be stored
`config.file`	Path to config file
`job.dependencies`	Vector with names of job dependencies
`quiet`	Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging.
`job.name`	Name of job to be submitted
`verify.options`	Logical indicating whether to run verify.varitas.options
`job.group`	Group job should belong to

Run MuTect for a sample

Description

Run MuTect for a sample

Usage

run.mutect.sample(tumour.bam, sample.id, paired, normal.bam = NULL,
  output.directory = NULL, output.filename = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, quiet = FALSE, job.name = NULL,
  verify.options = !quiet, job.group = NULL)
run.mutect.sample(tumour.bam, sample.id, paired, normal.bam = NULL,
  output.directory = NULL, output.filename = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, quiet = FALSE, job.name = NULL,
  verify.options = !quiet, job.group = NULL)

Arguments

`tumour.bam`	Path to tumour sample BAM file.
`sample.id`	Sample ID for labelling
`paired`	Logical indicating whether to do variant calling with a matched normal.
`normal.bam`	Path to normal BAM file if `paired = TRUE`
`output.directory`	Path to output directory
`output.filename`	Name of resulting VCF file (defaults to SAMPLE_ID.vcf)
`code.directory`	Path to directory where code should be stored
`log.directory`	Path to directory where log files should be stored
`config.file`	Path to config file
`job.dependencies`	Vector with names of job dependencies
`quiet`	Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging.
`job.name`	Name of job to be submitted
`verify.options`	Logical indicating whether to run verify.varitas.options
`job.group`	Group job should belong to

run.post.processing

Description

Submit post-processing job to the cluster with appropriate job dependencies

Usage

run.post.processing(variant.specification, output.directory,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.name.prefix = NULL, quiet = FALSE, email = NULL,
  verify.options = !quiet)
run.post.processing(variant.specification, output.directory,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.name.prefix = NULL, quiet = FALSE, email = NULL,
  verify.options = !quiet)

Arguments

`variant.specification`	Data frame specifying files to be processed
`output.directory`	Path to directory where output should be saved
`code.directory`	Directory where code should be saved
`log.directory`	Directory where log files should be saved
`config.file`	Path to config file
`job.name.prefix`	Prefix for job names on the cluster
`quiet`	Logical indicating whether to print commands to screen rather than submit the job
`email`	Email address that should be notified when job finishes. If NULL or FALSE, no email is sent
`verify.options`	Logical indicating whether `verify.varitas.options()` should be run.

Value

None

Examples

run.post.processing(
  variant.specification = data.frame(
    sample.id = c('a', 'b'),
    vcf = c('a.vcf', 'b.vcf'),
    caller = c('mutect', 'mutect'),
    job.dependency = c('example1', 'example2')
  ),
  output.directory = '.',
  quiet = TRUE
)

run.post.processing(
  variant.specification = data.frame(
    sample.id = c('a', 'b'),
    vcf = c('a.vcf', 'b.vcf'),
    caller = c('mutect', 'mutect'),
    job.dependency = c('example1', 'example2')
  ),
  output.directory = '.',
  quiet = TRUE
)

Perform sample QC by looking at target coverage.

Description

Perform sample QC by looking at target coverage.

Usage

run.target.qc(bam.specification, project.directory,
  sample.directories = TRUE, paired = FALSE,
  output.subdirectory = FALSE, quiet = FALSE, job.name.prefix = NULL,
  verify.options = FALSE, job.group = "target_qc")
run.target.qc(bam.specification, project.directory,
  sample.directories = TRUE, paired = FALSE,
  output.subdirectory = FALSE, quiet = FALSE, job.name.prefix = NULL,
  verify.options = FALSE, job.group = "target_qc")

Arguments

`bam.specification`	Data frame containing details of BAM files to be processed, typically from `prepare.bam.specification`.
`project.directory`	Path to project directory where code and log files should be saved
`sample.directories`	Logical indicating whether output for each sample should be put in its own directory (within output.directory)
`paired`	Logical indicating whether the analysis is paired. This does not affect QC directly, but means normal samples get nested
`output.subdirectory`	If further nesting is required, name of subdirectory. If no further nesting, set to FALSE
`quiet`	Logical indicating whether to print commands to screen rather than submit the job
`job.name.prefix`	Prefix for job names on the cluster
`verify.options`	Logical indicating whether to run verify.varitas.options
`job.group`	Group job should be associated with on cluster

Get ontarget reads and run coverage quality control

Description

Get ontarget reads and run coverage quality control

Usage

run.target.qc.sample(bam.file, sample.id, output.directory = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, job.name = NULL, job.group = NULL,
  quiet = FALSE)
run.target.qc.sample(bam.file, sample.id, output.directory = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, job.name = NULL, job.group = NULL,
  quiet = FALSE)

Arguments

`bam.file`	Path to BAM file
`sample.id`	Sample ID for labelling
`output.directory`	Path to output directory
`code.directory`	Path to directory where code should be stored
`log.directory`	Path to directory where log files should be stored
`config.file`	Path to config file
`job.dependencies`	Vector with names of job dependencies
`job.name`	Name of job to be submitted
`job.group`	Group job should belong to
`quiet`	Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging.

run.vardict.sample

Description

Run VarDict on a sample. Idea: have a low-level function that simply submits job to Perl, after BAM paths have been found. and output paths already have been decided upon

Usage

run.vardict.sample(tumour.bam, sample.id, paired, proton = FALSE,
  normal.bam = NULL, output.directory = NULL, output.filename = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, job.name = NULL, job.group = NULL,
  quiet = FALSE, verify.options = !quiet)
run.vardict.sample(tumour.bam, sample.id, paired, proton = FALSE,
  normal.bam = NULL, output.directory = NULL, output.filename = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, job.name = NULL, job.group = NULL,
  quiet = FALSE, verify.options = !quiet)

Arguments

`tumour.bam`	Path to tumour sample BAM file.
`sample.id`	Sample ID for labelling
`paired`	Logical indicating whether to do variant calling with a matched normal.
`proton`	Logical indicating whether the data was generated by proton sequencing. Defaults to False (i.e. Illumina)
`normal.bam`	Path to normal BAM file if `paired = TRUE`
`output.directory`	Path to output directory
`output.filename`	Name of resulting VCF file (defaults to SAMPLE_ID.vcf)
`code.directory`	Path to directory where code should be stored
`log.directory`	Path to directory where log files should be stored
`config.file`	Path to config file
`job.dependencies`	Vector with names of job dependencies
`job.name`	Name of job to be submitted
`job.group`	Group job should belong to
`quiet`	Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging.
`verify.options`	Logical indicating whether to run verify.varitas.options

run.variant.calling

Description

Run variant calling for all samples

Usage

run.variant.calling(bam.specification, output.directory,
  variant.callers = c("vardict", "mutect", "varscan", "lofreq", "muse"),
  paired = TRUE, proton = FALSE, sample.directories = TRUE,
  job.name.prefix = NULL, quiet = FALSE, verify.options = !quiet)
run.variant.calling(bam.specification, output.directory,
  variant.callers = c("vardict", "mutect", "varscan", "lofreq", "muse"),
  paired = TRUE, proton = FALSE, sample.directories = TRUE,
  job.name.prefix = NULL, quiet = FALSE, verify.options = !quiet)

Arguments

`bam.specification`	Data frame containing details of BAM files to be processed, typically from `prepare.bam.specification`.
`output.directory`	Path to directory where output should be saved
`variant.callers`	Character vector of variant callers to be used
`paired`	Logical indicating whether to do variant calling with a matched normal
`proton`	Logical indicating whether data was generated by proton sequencing (ignored if running MuTect)
`sample.directories`	Logical indicating whether output for each sample should be put in its own directory (within output.directory)
`job.name.prefix`	Prefix for job names on the cluster
`quiet`	Logical indicating whether to print commands to screen rather than submit the job
`verify.options`	Logical indicating whether to run verify.varitas.options

Details

Run VarDict on each sample, and annotate the results with ANNOVAR. Files are output to a vardict/ subdirectory within each sample directory.

Value

None

Examples

run.variant.calling(
  data.frame(sample.id = c('Z', 'Y'), tumour.bam = c('Z.bam', 'Y.bam')),
  output.directory = '.',
  variant.caller = c('lofreq', 'mutect'),
  quiet = TRUE,
  paired = FALSE
)

run.variant.calling(
  data.frame(sample.id = c('Z', 'Y'), tumour.bam = c('Z.bam', 'Y.bam')),
  output.directory = '.',
  variant.caller = c('lofreq', 'mutect'),
  quiet = TRUE,
  paired = FALSE
)

Run VariTAS pipeline in full.

Description

Run all steps in VariTAS processing pipeline, with appropriate dependencies.

Usage

run.varitas.pipeline(file.details, output.directory, run.name = NULL,
  start.stage = c("alignment", "qc", "calling", "annotation", "merging"),
  variant.callers = NULL, proton = FALSE, quiet = FALSE,
  email = NULL, verify.options = !quiet,
  save.specification.files = !quiet)
run.varitas.pipeline(file.details, output.directory, run.name = NULL,
  start.stage = c("alignment", "qc", "calling", "annotation", "merging"),
  variant.callers = NULL, proton = FALSE, quiet = FALSE,
  email = NULL, verify.options = !quiet,
  save.specification.files = !quiet)

Arguments

`file.details`	Data frame containing details of files to be used during first processing step. Depending on what you want to be the first step in the pipeline, this can either be FASTQ files, BAM files, VCF files, or variant (txt) files.
`output.directory`	Main directory where all files should be saved
`run.name`	Name of pipeline run. Will be added as a prefix to all LSF jobs.
`start.stage`	String indicating which stage pipeline should start at. If starting at a later stage of the pipeline, appropriate input files must be provided. For example, if starting with annotation, VCF files with variant calls must be provided.
`variant.callers`	Vector specifying which variant callers should be run.
`proton`	Logical indicating if data was generated by proton sequencing. Used to set base quality thresholds in variant calling steps.
`quiet`	Logical indicating whether to print commands to screen rather than submit jobs. Defaults to FALSE, can be useful to set to TRUE for testing.
`email`	Email address that should be notified when pipeline finishes. If NULL or FALSE, no email is sent.
`verify.options`	Logical indicating whether to run verify.varitas.options
`save.specification.files`	Logical indicating if specification files should be saved to project directory

Value

None

Examples

run.varitas.pipeline(
       file.details = data.frame(
        sample.id = c('1', '2'),
        reads = c('1-R1.fastq.gz', '2-R1.fastq.gz'),
        mates = c('1-R2.fastq.gz', '2-R2.fastq.gz'),
         patient.id = c('P1', 'P1'),
         tissue = c('tumour', 'normal')
       ),
       output.directory = '.',
       quiet = TRUE,
       run.name = "Test", 
       variant.callers = c('mutect', 'varscan')
     )

run.varitas.pipeline(
       file.details = data.frame(
        sample.id = c('1', '2'),
        reads = c('1-R1.fastq.gz', '2-R1.fastq.gz'),
        mates = c('1-R2.fastq.gz', '2-R2.fastq.gz'),
         patient.id = c('P1', 'P1'),
         tissue = c('tumour', 'normal')
       ),
       output.directory = '.',
       quiet = TRUE,
       run.name = "Test", 
       variant.callers = c('mutect', 'varscan')
     )

run.varitas.pipeline.hybrid

Description

Run VariTAS pipeline starting from both VCF files and BAM/ FASTQ files. Useful for processing data from the Ion PGM or MiniSeq where variant calling has been done on the machine, but you are interested in running more variant callers.

Usage

run.varitas.pipeline.hybrid(vcf.specification, output.directory,
  run.name = NULL, fastq.specification = NULL,
  bam.specification = NULL, variant.callers = c("mutect", "vardict",
  "varscan", "lofreq", "muse"), proton = FALSE, quiet = FALSE,
  email = NULL, verify.options = !quiet,
  save.specification.files = !quiet)
run.varitas.pipeline.hybrid(vcf.specification, output.directory,
  run.name = NULL, fastq.specification = NULL,
  bam.specification = NULL, variant.callers = c("mutect", "vardict",
  "varscan", "lofreq", "muse"), proton = FALSE, quiet = FALSE,
  email = NULL, verify.options = !quiet,
  save.specification.files = !quiet)

Arguments

`vcf.specification`	Data frame containing details of vcf files to be processed. Must contain columns sample.id, vcf, and caller
`output.directory`	Main directory where all files should be saved
`run.name`	Name of pipeline run. Will be added as a prefix to all LSF jobs.
`fastq.specification`	Data frame containing details of FASTQ files to be processed
`bam.specification`	Data frame containing details of BAM files to be processed
`variant.callers`	Vector specifying which variant callers should be run.
`proton`	Logical indicating if data was generated by proton sequencing. Used to set base quality thresholds in variant calling steps.
`quiet`	Logical indicating whether to print commands to screen rather than submit jobs. Defaults to FALSE, can be useful to set to TRUE for testing.
`email`	Email address that should be notified when pipeline finishes. If NULL or FALSE, no email is sent.
`verify.options`	Logical indicating whether to run verify.varitas.options
`save.specification.files`	Logical indicating if specification files should be saved to project directory

Value

None

Examples

run.varitas.pipeline.hybrid(
       bam.specification = data.frame(sample.id = c('Z', 'Y'), tumour.bam = c('Z.bam', 'Y.bam')),
       vcf.specification = data.frame(
         sample.id = c('a', 'b'),
         vcf = c('a.vcf', 'b.vcf'),
         caller = c('pgm', 'pgm')
       ),
       output.directory = '.',
       quiet = TRUE,
       run.name = "Test", 
       variant.callers = c('mutect', 'varscan')
     )

run.varitas.pipeline.hybrid(
       bam.specification = data.frame(sample.id = c('Z', 'Y'), tumour.bam = c('Z.bam', 'Y.bam')),
       vcf.specification = data.frame(
         sample.id = c('a', 'b'),
         vcf = c('a.vcf', 'b.vcf'),
         caller = c('pgm', 'pgm')
       ),
       output.directory = '.',
       quiet = TRUE,
       run.name = "Test", 
       variant.callers = c('mutect', 'varscan')
     )

Run VarScan for a sample

Description

Run VarScan for a sample

Usage

run.varscan.sample(tumour.bam, sample.id, paired, normal.bam = NULL,
  output.directory = NULL, output.filename = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, quiet = FALSE, job.name = NULL,
  verify.options = !quiet, job.group = NULL)
run.varscan.sample(tumour.bam, sample.id, paired, normal.bam = NULL,
  output.directory = NULL, output.filename = NULL,
  code.directory = NULL, log.directory = NULL, config.file = NULL,
  job.dependencies = NULL, quiet = FALSE, job.name = NULL,
  verify.options = !quiet, job.group = NULL)

Arguments

`tumour.bam`	Path to tumour sample BAM file.
`sample.id`	Sample ID for labelling
`paired`	Logical indicating whether to do variant calling with a matched normal.
`normal.bam`	Path to normal BAM file if `paired = TRUE`
`output.directory`	Path to output directory
`output.filename`	Name of resulting VCF file (defaults to SAMPLE_ID.vcf)
`code.directory`	Path to directory where code should be stored
`log.directory`	Path to directory where log files should be stored
`config.file`	Path to config file
`job.dependencies`	Vector with names of job dependencies
`quiet`	Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging.
`job.name`	Name of job to be submitted
`verify.options`	Logical indicating whether to run verify.varitas.options
`job.group`	Group job should belong to

save.config

Description

Save current varitas config options to a temporary file, and return filename.

Usage

save.config(output.file = NULL)
save.config(output.file = NULL)

Arguments

output.file

Path to output file. If NULL (default), the config file will be saved as a temporary file.

Value

Path to config file

Save coverage statistics to multi-worksheet Excel file.

Description

Save coverage statistics to multi-worksheet Excel file.

Usage

save.coverage.excel(project.directory, file.name, overwrite = TRUE)
save.coverage.excel(project.directory, file.name, overwrite = TRUE)

Arguments

`project.directory`	Path to project directory
`file.name`	Name of output file
`overwrite`	Logical indicating whether to overwrite existing file if it exists.

Value

None

Save variants to Excel.

Description

Makes an Excel workbook with variant calls. If filters are provided, these will be saved to an additional worksheet within the same file.

Usage

save.variants.excel(variants, file.name, filters = NULL,
  overwrite = TRUE)
save.variants.excel(variants, file.name, filters = NULL,
  overwrite = TRUE)

Arguments

`variants`	Data frame containing variants
`file.name`	Name of output file
`filters`	Optional list of filters to be saved
`overwrite`	Logical indicating whether to overwrite exiting file if it exists. Defaults to TRUE for consistency with other R functions.

Set options for varitas pipeline.

Description

Set or overwrite options for the VariTAS pipeline. Nested options should be separated by a dot. For example, to update the reference genome for grch38, use reference_genome.grch38

Usage

set.varitas.options(...)
set.varitas.options(...)

Arguments

...

options to set

Value

None

Examples

## Not run: 
set.varitas.options(reference_build = 'grch38'); 	
	set.varitas.options(
	filters.mutect.min_normal_depth = 10,
	filters.vardict.min_normal_depth = 10
	);

## End(Not run)
## Not run: 
set.varitas.options(reference_build = 'grch38'); 	
	set.varitas.options(
	filters.mutect.min_normal_depth = 10,
	filters.vardict.min_normal_depth = 10
	);

## End(Not run)

split.on.column

Description

Split data frame on a concatenated column.

Usage

## S3 method for class 'on.column'
split(dat, column, split.character)
## S3 method for class 'on.column'
split(dat, column, split.character)

Arguments

`dat`	Data frame to be processed
`column`	Name of column to split on
`split.character`	Pattern giving character to split column on

Value

Data frame after splitting on column

sum.dp4

Description

Simply calculates the depth of coverage of the variant allele given a string of DP4 values

Usage

## S3 method for class 'dp4'
sum(dp4.str)
## S3 method for class 'dp4'
sum(dp4.str)

Arguments

dp4.str

String of DP4 values in the form "1234,1234,1234,1234"

Run ls command

Description

Runs ls command on system. This is a workaround since list.files can not match patterns based on subdirectory structure.

Usage

system.ls(pattern = "", directory = "", error = FALSE)
system.ls(pattern = "", directory = "", error = FALSE)

Arguments

`pattern`	pattern to match files
`directory`	base directory command should be run from
`error`	logical indicating whether to throw an error if no matching founds found. Defaults to False.

Value

paths returned by ls command

tabular.mean

Description

Calculate the mean of data in tabular format

Usage

tabular.mean(values, frequencies, ...)
tabular.mean(values, frequencies, ...)

Arguments

`values`	vector of values
`frequencies`	frequency corresponding to each value
`...`	Additional parameters passed to `sum`

Value

calculated mean

tabular.median

Description

Calculate the median of data in tabular format

Usage

tabular.median(values, frequencies, ...)
tabular.median(values, frequencies, ...)

Arguments

`values`	Vector of values
`frequencies`	Frequency corresponding to each value
`...`	Additional parameters passed to `sum`

Value

calculated median

Make barplot of trinucleotide substitutions

Description

Make barplot of trinucleotide substitutions

Usage

trinucleotide.barplot(variants, file.name)
trinucleotide.barplot(variants, file.name)

Arguments

`variants`	Data frame with variants
`file.name`	Name of output file

Value

None

Make barplot of variants per caller

Description

Make barplot of variants per caller

Usage

variant.recurrence.barplot(variants, file.name)
variant.recurrence.barplot(variants, file.name)

Arguments

`variants`	Data frame with variants
`file.name`	Name of output file

Value

None

Make barplot of variants per caller

Description

Make barplot of variants per caller

Usage

variants.caller.barplot(variants, file.name, group.by = NULL)
variants.caller.barplot(variants, file.name, group.by = NULL)

Arguments

`variants`	Data frame with variants
`file.name`	Name of output file
`group.by`	Optional grouping variable for barplot

Value

None

Make barplot of variants per sample

Description

Make barplot of variants per sample

Usage

variants.sample.barplot(variants, file.name)
variants.sample.barplot(variants, file.name)

Arguments

`variants`	Data frame with variants
`file.name`	Name of output file

Value

None

Check that sample specification data frame matches expected format, and that all files exist

Description

Check that sample specification data frame matches expected format, and that all files exist

Usage

verify.bam.specification(bam.specification)
verify.bam.specification(bam.specification)

Arguments

bam.specification

Data frame containing columns sample.id and tumour.bam, and optionally a column normal.bam.

Value

None

verify.bwa.index

Description

Verify that bwa index files exist for a fasta file

Usage

verify.bwa.index(fasta.file, error = FALSE)
verify.bwa.index(fasta.file, error = FALSE)

Arguments

`fasta.file`	Fasta file to check
`error`	Logical indicating whether to throw an (informative) error if verification fails

Value

index.files.exist Logical indicating if bwa index files were found (only returned if error set to FALSE)

verify.fasta.index

Description

Verify that fasta index files exist for a given fasta file.

Usage

verify.fasta.index(fasta.file, error = FALSE)
verify.fasta.index(fasta.file, error = FALSE)

Arguments

`fasta.file`	Fasta file to check
`error`	Logical indicating whether to throw an (informative) error if verification fails

Value

faidx.exists Logical indicating if fasta index files were found (only returned if error set to FALSE)

Check that FASTQ specification data frame matches expected format, and that all files exist

Description

Check that FASTQ specification data frame matches expected format, and that all files exist

Usage

verify.fastq.specification(fastq.specification, paired.end = FALSE,
  files.ready = FALSE)
verify.fastq.specification(fastq.specification, paired.end = FALSE,
  files.ready = FALSE)

Arguments

`fastq.specification`	Data frame containing columns sample.id and reads, and optionally a column mates
`paired.end`	Logical indicating whether paired end reads are used
`files.ready`	Logical indicating if the files already exist on disk. If there are job dependencies, this should be set to FALSE.

Value

None

verify.sequence.dictionary

Description

Verify that sequence dictionary exists for a fasta file.

Usage

verify.sequence.dictionary(fasta.file, error = FALSE)
verify.sequence.dictionary(fasta.file, error = FALSE)

Arguments

`fasta.file`	Fasta file to check
`error`	Logical indicating whether to throw an (informative) error if verification fails

Value

dict.exists Logical indicating if sequence dictionary files were found (only returned if error set to FALSE)

Check against common errors in the VariTAS options.

Description

Check against common errors in the VariTAS options before launching into pipeline

Usage

verify.varitas.options(stages.to.run = c("alignment", "qc", "calling",
  "annotation", "merging"), variant.callers = c("mutect", "vardict",
  "ides", "varscan", "lofreq", "muse"), varitas.options = NULL)
verify.varitas.options(stages.to.run = c("alignment", "qc", "calling",
  "annotation", "merging"), variant.callers = c("mutect", "vardict",
  "ides", "varscan", "lofreq", "muse"), varitas.options = NULL)

Arguments

`stages.to.run`	Vector indicating which stages should be run. Defaults to all possible stages. If only running a subset of stages, only checks corresponding to the desired stages are run
`variant.callers`	Vector indicating which variant callers to run. Only used if calling is in `stages.to.run`.
`varitas.options`	Optional file path or list of VariTAS options.

Value

None

verify.vcf.specification

Description

Verify that VCF specification data frame fits expected format

Usage

verify.vcf.specification(vcf.specification)
verify.vcf.specification(vcf.specification)

Arguments

vcf.specification

VCF specification data frame

Value

None

`name`	Option name. Nesting is indicated by character specified in nesting.character.
`value`	New value of option
`old.options`	Nested list the option should be added to
`nesting.character`	String giving Regex pattern of nesting indication string. Defaults to '\.'

Package 'varitas'

Help Index

add.option

Description

Usage

Arguments

Value

alternate.gene.sort

Description

Usage

Arguments

Details

Value

build.variant.specification

Description

Usage

Arguments

Details

Value

. Make Venn diagram of variant caller overlap

Description

Usage

Arguments

capitalize.caller

Description

Usage

Arguments

Value

classify.variant

Description

Usage

Arguments

Value

Convert output of iDES step 1 to variant call format

Description

Usage

Arguments

Value

create.directories

Description

Usage

Arguments

date.stamp.file.name

Description

Usage

Arguments

Value

Examples

Extract sample IDs from file paths

Description

Usage

Arguments

Value

Filter variants in file.

Description

Usage

Arguments

Value

Filter variant calls

Description

Usage

Arguments

Value

fix.lofreq.af

Description

Usage

Arguments

Fix variant call column names

Description

Usage

Arguments

Value

fix.varscan.af

Description

Usage

Arguments

Get base substitution

Description

Usage

Arguments