Title: | Variant Calling in Targeted Analysis Sequencing Data |
---|---|
Description: | Multi-caller variant analysis pipeline for targeted analysis sequencing (TAS) data. Features a modular, automated workflow that can start with raw reads and produces a user-friendly PDF summary and a spreadsheet containing consensus variant information. |
Authors: | Adam Mills [aut, cre], Erle Holgersen [aut], Ros Cutts [aut], Syed Haider [aut] |
Maintainer: | Adam Mills <[email protected]> |
License: | GPL-2 |
Version: | 0.0.2 |
Built: | 2024-12-12 07:04:55 UTC |
Source: | CRAN |
Add option to nested list of options. Applied recursively
add.option(name, value, old.options, nesting.character = "\\.")
add.option(name, value, old.options, nesting.character = "\\.")
name |
Option name. Nesting is indicated by character specified in nesting.character. |
value |
New value of option |
old.options |
Nested list the option should be added to |
nesting.character |
String giving Regex pattern of nesting indication string. Defaults to '\.' |
Nested list with updated options
Given a data frame containing coverage statistics and gene information, returns that frame with the rows sorted by alternating gene size (for plotting)
alternate.gene.sort(coverage.statistics)
alternate.gene.sort(coverage.statistics)
coverage.statistics |
Data frame of coverage statistics |
Genes have varying numbers of associated amplicons and when plotting coverage statistics, if two genes with very low numbers of amplicons are next to each other, the labels will overlap. This function sorts the coverage statistics data frame in a way that places the genes with the most amplicons (largest) next to those with the least (smallest).
Coverage statistics data frame sorted by alternating gene size
Build data frame with paths to variant files.
build.variant.specification(sample.ids, project.directory)
build.variant.specification(sample.ids, project.directory)
sample.ids |
Vector of sample IDs. Must match subdirectories in project.directory. |
project.directory |
Path to directory where sample subdirectories |
Parses through sample IDs in a project directory and returns paths to variant files based on (theoretical) file name patterns. Useful for testing, or for entering the pipeline at non-traditional stages.
Data frame with paths to variant files.
. Make Venn diagram of variant caller overlap
caller.overlap.venn.diagram(variants, file.name)
caller.overlap.venn.diagram(variants, file.name)
variants |
Data frame containing variants, typically from merge.variants function |
file.name |
Name of output file |
Capitalize variant caller name
capitalize.caller(caller) capitalise.caller(caller)
capitalize.caller(caller) capitalise.caller(caller)
caller |
Character vector of callers to be capitalized |
Vector of same length as caller where eligible callers have been capitalized
Classify a variant as SNV, MNV, or indel based on the reference and alternative alleles
classify.variant(ref, alt)
classify.variant(ref, alt)
ref |
Vector of reference bases |
alt |
Vector of alternate bases |
Character vector giving type of variant.
Convert output of iDES step 1 to variant call format
convert.ides.output(filename, output = TRUE, output.suffix = ".calls.txt", minreads = 5, mindepth = 50)
convert.ides.output(filename, output = TRUE, output.suffix = ".calls.txt", minreads = 5, mindepth = 50)
filename |
Path to file |
output |
Logical indicating whether output should be saved to file. Defaults to true. |
output.suffix |
Suffix to be appended to input filename if saving results to file |
minreads |
Minimum numbers of reads |
mindepth |
Minimum depth |
potential.calls Data frame of converted iDES calls
Create directories in a given path
create.directories(directory.names, path)
create.directories(directory.names, path)
directory.names |
Vector of names of directories to be created |
path |
Path where directories should be created |
Prefix file name with a date-stamp.
date.stamp.file.name(file.name, date = Sys.Date(), separator = "_")
date.stamp.file.name(file.name, date = Sys.Date(), separator = "_")
file.name |
File name to be date-stamped |
date |
Date to be added. Defaults to current date. |
separator |
String that should separate the date from the file name. Defaults to a single underscore. |
String giving the datestamped file name
date.stamp.file.name('plot.png'); date.stamp.file.name('yesterdays_plot.png', date = Sys.Date() - 1);
date.stamp.file.name('plot.png'); date.stamp.file.name('yesterdays_plot.png', date = Sys.Date() - 1);
Extract sample IDs from a set of paths to files in sample-specific subfolders
extract.sample.ids(paths, from.filename = FALSE)
extract.sample.ids(paths, from.filename = FALSE)
paths |
vector of file paths |
from.filename |
Logical indicating whether sample ID should be extracted from filename rather than path |
vector of extracted sample IDs
Filter variants from file, and save to output. Wrapper function that opens the variant file, calls filter.variants, and saves the result to file
filter.variant.file(variant.file, output.file, config.file = NULL, caller = c("vardict", "ides", "mutect", "pgm", "consensus"))
filter.variant.file(variant.file, output.file, config.file = NULL, caller = c("vardict", "ides", "mutect", "pgm", "consensus"))
variant.file |
Path to variant file |
output.file |
Path to output file |
config.file |
Path to config file to be used. If not supplied, will use the pre-existing VariTAS options. |
caller |
Name of caller used (needed to match appropriate filters from settings) |
None
Filter data frame of variant calls based on thresholds specified in settings.
filter.variants(variants, caller = c("vardict", "ides", "mutect", "pgm", "consensus", "isis", "varscan", "lofreq"), config.file = NULL, verbose = FALSE)
filter.variants(variants, caller = c("vardict", "ides", "mutect", "pgm", "consensus", "isis", "varscan", "lofreq"), config.file = NULL, verbose = FALSE)
variants |
Data frame of variant calls with ANNOVAR annotation, or path to variant file. |
caller |
Name of caller used (needed to match appropriate filters from settings) |
config.file |
Path to config file to be used. If not supplied, will use the pre-existing VariTAS options. |
verbose |
Logical indicating whether to output descriptions of filtering steps. Defaults to False, useful for debugging. |
filtered.variants Data frame of filtered variants
LoFreq also does not output allele frequencies, so this script calculates them from the DP (depth) and AD (variant allele depth) values–which are also not output nicely– and adds them to the annotated vcf.
fix.lofreq.af(variant.specification)
fix.lofreq.af(variant.specification)
variant.specification |
Data frame of variant file information |
Fix headers of variant calls to prepare for merging. This mostly consists in making sure the column headers will be unique by prefixing the variant caller in question.
fix.names(column.names, variant.caller, sample.id = NULL)
fix.names(column.names, variant.caller, sample.id = NULL)
column.names |
Character vector of column names |
variant.caller |
String giving name of variant caller |
sample.id |
Optional sample ID. Used to fix headers. |
new.column.names Vector of column names after fixing]
VarScan does not output allele frequencies, so this script calculates them from the DP (depth) and AD (variant allele depth) values and adds them to the annotated vcf.
fix.varscan.af(variant.specification)
fix.varscan.af(variant.specification)
variant.specification |
Data frame of variant file information |
Get base substitution represented by pyrimidine in base pair. If more than one base in REF/ALT (i.e. MNV or indel rather than SNV), NA will be returned
get.base.substitution(ref, alt)
get.base.substitution(ref, alt)
ref |
Vector of reference bases |
alt |
Vector of alternate bases |
base.substitutions
Extract chromosomes from bed file
get.bed.chromosomes(bed)
get.bed.chromosomes(bed)
bed |
Path to BED file |
Vector containing all chromosomes in BED file
Get build version (hg19/hg38) based on settings.
Parses VariTAS pipeline settings to get the build version. When this function was first developed, the idea was to be able to explicitly set ANNOVAR filenames based on the build version.
get.buildver()
get.buildver()
String giving reference genome build version (hg19 or hg38)
Generate a colour scheme
get.colours(n)
get.colours(n)
n |
Number of colours desired |
Colour.scheme generated colours
Parse coverageBed output to get coverage by amplicon
get.coverage.by.amplicon(project.directory)
get.coverage.by.amplicon(project.directory)
project.directory |
Path to project directory. Each sample should have its own subdirectory |
combined.data Data frame giving coverage per amplicon per sample.
http://bedtools.readthedocs.io/en/latest/content/tools/coverage.html
Get statistics about coverage per sample
get.coverage.by.sample.statistics(project.directory)
get.coverage.by.sample.statistics(project.directory)
project.directory |
Path to project directory. Each sample should have its own subdirectory |
coverage.by.sample.statistics Data frame with coverage statistics per sample
Extract chromosomes from fasta headers.
get.fasta.chromosomes(fasta)
get.fasta.chromosomes(fasta)
fasta |
Path to reference fasta |
Vector containing all chromosomes in fasta file.
Get absolute path to sample-specific file for one or more samples
get.file.path(sample.ids, directory, extension = NULL, allow.multiple = FALSE, allow.none = FALSE)
get.file.path(sample.ids, directory, extension = NULL, allow.multiple = FALSE, allow.none = FALSE)
sample.ids |
Vector of sample IDs to match filename on |
directory |
Path to directory containing files |
extension |
String giving extension of file |
allow.multiple |
Boolean indicating whether to allow multiple matching files. Defaults to false, which throws an error if the query matches more than one file. |
allow.none |
Boolean indicating whether to allow no matching files. Defaults to false, which throws an error if the query does not match any files. |
Paths to matched files
Determine filters per caller, given default and caller-specific values.
get.filters(filters)
get.filters(filters)
filters |
List of filter values. These will be updated to use default as the baseline, with caller-specific filters taking precedence if supplied. |
A list with updated filters
Use guesswork to extract gene from data frame of targeted panel data. The panel designer output can change, so try to guess what the format is.
get.gene(bed.data)
get.gene(bed.data)
bed.data |
Data frame containing data from bed file |
vector of gene names, one entry for each row of bed.data
Get files for a sample in a directory, ensuring there's only a single match per sample ID.
get.miniseq.sample.files(sample.ids, directory, file.suffix = "_S\\d{1,2}_.*")
get.miniseq.sample.files(sample.ids, directory, file.suffix = "_S\\d{1,2}_.*")
sample.ids |
Vector of sample ids. Should form first part of file name |
directory |
Directory where files can be found |
file.suffix |
Regex expression for end of file name. For example, ‘file.suffix = ’_S\d1,2_.*_R1_.*'' will match R1 files.1 files. |
Character vector of file paths
Helper function to recursively get an VariTAS option
get.option(name, varitas.options = NULL, nesting.character = "\\.")
get.option(name, varitas.options = NULL, nesting.character = "\\.")
name |
Option name |
varitas.options |
Optional list of options to search in |
nesting.character |
String giving Regex pattern of nesting indication string. Defaults to '\.' |
value Requested option
Summarise panel coverage by gene
get.panel.coverage.by.gene(panel.file, gene.col = 5)
get.panel.coverage.by.gene(panel.file, gene.col = 5)
panel.file |
path to panel |
gene.col |
index of column containing gene name |
panel.coverage.by.gene data frame giving the number of amplicons and their total length by gene
The bed files are not consistent, so it's not clear where the pool will appear. This function parses through the columns to identify where the pool
get.pool.from.panel.data(panel.data)
get.pool.from.panel.data(panel.data)
panel.data |
data frame pool should be extracted from |
pools vector of pool information
Return VariTAS settings
get.varitas.options(option.name = NULL, nesting.character = "\\.")
get.varitas.options(option.name = NULL, nesting.character = "\\.")
option.name |
Optional name of option. If no name is supplied, the full list of VariTAS options will be provided. |
nesting.character |
String giving Regex pattern of nesting indication string. Defaults to '\.' |
varitas.options list specifying VariTAS options
reference.build <- get.varitas.options('reference_build'); mutect.filters <- get.varitas.options('filters.mutect');
reference.build <- get.varitas.options('reference_build'); mutect.filters <- get.varitas.options('filters.mutect');
Extract chromosomes from a VCF file.
get.vcf.chromosomes(vcf)
get.vcf.chromosomes(vcf)
vcf |
Path to VCF file |
Vector containing all chromosomes in VCF
Check if a key is in VariTAS options
in.varitas.options(option.name = NULL, varitas.options = NULL, nesting.character = "\\.")
in.varitas.options(option.name = NULL, varitas.options = NULL, nesting.character = "\\.")
option.name |
String giving name of option (with different levels joined by |
varitas.options |
Ampliseq options as a list. If missing, they will be obtained from |
nesting.character |
String giving Regex pattern of nesting indication string. Defaults to '\.' |
in.options Boolean indicating if the option name exists in the current varitas options
Convert a logical vector to a T/F coded character vector. Useful for preventing unwanted T->TRUE nucleotide conversions
logical.to.character(x)
logical.to.character(x)
x |
Vector to be converted |
Character vector after converting TRUE/FALSE
Make string with command line call from its individual components
make.command.line.call(main.command, options = NULL, flags = NULL, option.prefix = "--", option.separator = " ", flag.prefix = "--")
make.command.line.call(main.command, options = NULL, flags = NULL, option.prefix = "--", option.separator = " ", flag.prefix = "--")
main.command |
String or vector of strings giving main part of command (e.g. "python test.py" or c("python", "test.py")) |
options |
Named vector or list giving options |
flags |
Vector giving flags to include. |
option.prefix |
String to preface all options. Defaults to "–" |
option.separator |
String to separate options form their values. Defaults to a single space. |
flag.prefix |
String to preface all flags. Defaults to "–" |
command string giving command line call
Get mean value of a variant annotation field
## S3 method for class 'field.value' mean(variants, field = c("TUMOUR.DP", "NORMAL.DP", "NORMAL.AF", "TUMOUR.AF", "QUAL"), caller = c("consensus", "vardict", "pgm", "mutect", "isis", "varscan", "lofreq"))
## S3 method for class 'field.value' mean(variants, field = c("TUMOUR.DP", "NORMAL.DP", "NORMAL.AF", "TUMOUR.AF", "QUAL"), caller = c("consensus", "vardict", "pgm", "mutect", "isis", "varscan", "lofreq"))
variants |
Data frame with variants |
field |
String giving field of interest. |
caller |
String giving caller to calculate values from |
As part of the variant merging process, annotated variant data frames are merged into one, with the value from each caller prefixed by CALLER. For example, the VarDict normal allele freqeuncy will have header VARDICT.NORMAL.AF. This function takes the average of all callers' value for a given field, removing NA's. If only a single caller is present in the data frame, that value is returned.
Vector of mean values.
Merge potential iDES calls with variant annotation.
## S3 method for class 'ides.annotation' merge(ides.filename, output = TRUE, output.suffix = ".ann.txt", annovar.suffix.pattern = ".annovar.hg(\\d{2})_multianno.txt")
## S3 method for class 'ides.annotation' merge(ides.filename, output = TRUE, output.suffix = ".ann.txt", annovar.suffix.pattern = ".annovar.hg(\\d{2})_multianno.txt")
ides.filename |
Path to formatted iDES output (typically from convert.ides.output file) |
output |
Logical indicating whether output should be saved to file. Defaults to true. |
output.suffix |
Suffix to be appended to input filename if saving results to file |
annovar.suffix.pattern |
Suffix to match ANNOAR file |
The VarDict variant calling includes a GATK call merging the call vcf file (allele frequency information etc.) with the ANNOVAR annotation, and saving the result as a table. This function is an attempt to emulate that step for the iDES calls.
annotated.calls Data frame of annotations and iDES output.
Merge variants from multiple callers and return a data frame of merged calls. By default filtering is also applied, although this behaviour can be turned off by setting apply.filters to FALSE.
## S3 method for class 'variants' merge(variant.specification, apply.filters = TRUE, remove.structural.variants = TRUE, separate.consensus.filters = FALSE, verbose = FALSE)
## S3 method for class 'variants' merge(variant.specification, apply.filters = TRUE, remove.structural.variants = TRUE, separate.consensus.filters = FALSE, verbose = FALSE)
variant.specification |
Data frame containing details of file paths, sample IDs, and caller. |
apply.filters |
Logical indicating whether to apply filters. Defaults to TRUE. |
remove.structural.variants |
Logical indicating whether structural variants (including CNVs) should be removed. Defaults to TRUE. |
separate.consensus.filters |
Logical indicating whether to apply different thresholds to variants called by more than one caller (specified under consensus in config file). Defaults to FALSE. |
verbose |
Logical indicating whether to print information to screen |
Data frame
Overwrite VariTAS options with options provided in config file.
overwrite.varitas.options(config.file)
overwrite.varitas.options(config.file)
config.file |
Path to config file that should be used to overwrite options |
None
## Not run: config <- file.path(path.package('varitas'), 'config.yaml') overwrite.varitas.options(config) ## End(Not run)
## Not run: config <- file.path(path.package('varitas'), 'config.yaml') overwrite.varitas.options(config) ## End(Not run)
Parse job dependencies to make the functions more robust to alternate inputs (e.g. people writing alignment instead of bwa)
parse.job.dependencies(dependencies)
parse.job.dependencies(dependencies)
dependencies |
Job dependency strings to be parsed. |
parsed.dependencies Vector of job dependencies after reformatting.
Create one scatterplot per sample, showing coverage per amplicon, and an additional plot giving the median
## S3 method for class 'amplicon.coverage.per.sample' plot(coverage.statistics, output.directory)
## S3 method for class 'amplicon.coverage.per.sample' plot(coverage.statistics, output.directory)
coverage.statistics |
Data frame containing coverage per amplicon per sample, typically from |
output.directory |
Directory where per sample plots should be saved |
None
Use values obtained by bedtools coverage to make a plot of coverage by genome order
## S3 method for class 'coverage.by.genome.order' plot(coverage.data)
## S3 method for class 'coverage.by.genome.order' plot(coverage.data)
coverage.data |
data frame with results from bedtools coverage command |
Make a barplot of coverage per sample
## S3 method for class 'coverage.by.sample' plot(coverage.sample, file.name, statistic = c("mean", "median"))
## S3 method for class 'coverage.by.sample' plot(coverage.sample, file.name, statistic = c("mean", "median"))
coverage.sample |
Data frame of coverage data, typically from |
file.name |
Name of output file |
statistic |
Statistic to be plotted (mean or median) |
None
Make a scatterplot of ontarget percent per sample
## S3 method for class 'ontarget.percent' plot(coverage.sample, file.name)
## S3 method for class 'ontarget.percent' plot(coverage.sample, file.name)
coverage.sample |
Data frame of coverage data, typically from |
file.name |
Name of output file |
None
Make a barplot of percent paired reads per sample
## S3 method for class 'paired.percent' plot(coverage.sample, file.name)
## S3 method for class 'paired.percent' plot(coverage.sample, file.name)
coverage.sample |
Data frame of coverage data, typically from |
file.name |
Name of output file |
None
Post-processing of variants to generate outputs
post.processing(variant.specification, project.directory, config.file = NULL, variant.callers = NULL, remove.structural.variants = TRUE, separate.consensus.filters = FALSE, sleep = FALSE, verbose = FALSE)
post.processing(variant.specification, project.directory, config.file = NULL, variant.callers = NULL, remove.structural.variants = TRUE, separate.consensus.filters = FALSE, sleep = FALSE, verbose = FALSE)
variant.specification |
Data frame specifying variants to be processed, or path to data frame (useful if calling from Perl) |
project.directory |
Directory where output should be stored. Output files will be saved to a datestamped subdirectory |
config.file |
Path to config file specifying post-processing options. If not provided, the current options are used (i.e. from |
variant.callers |
Optional vector of variant callers for which filters should be included in Excel file |
remove.structural.variants |
Logical indicating whether structural variants (including CNVs) should be removed. Defaults to TRUE. |
separate.consensus.filters |
Logical indicating whether to apply different thresholds to variants called by more than one caller (specified under consensus in config file). Defaults to FALSE. |
sleep |
Logical indicating whether script should sleep for 60 seconds before starting. |
verbose |
Logical indicating whether to print verbose output |
None
This function prepares a data frame that can be used to run variant callers. For matched normal variant calling, this data frame will contain three columns with names: sample.id, tumour.bam, normal.bam For unpaired variant calling, the data frame will contain two columns with names: sample.id, tumour.bam
prepare.bam.specification(sample.details, paired = TRUE, sample.id.column = 1, tumour.bam.column = 2, normal.bam.column = 3)
prepare.bam.specification(sample.details, paired = TRUE, sample.id.column = 1, tumour.bam.column = 2, normal.bam.column = 3)
sample.details |
Data frame where each row represents a sample to be run. Must contain sample ID, path to tumour BAM, and path to normal BAM. |
paired |
Logical indicating whether the sample specification is for a paired analysis. |
sample.id.column |
Index or string giving column of sample.details that contains the sample ID |
tumour.bam.column |
Index or string giving column of sample.details that contains the path to the tumour BAM |
normal.bam.column |
Index or string giving column of sample.details that contains the path to the normal BAM |
bam.specification Data frame with one row per sample to be run
Prepare FASTQ specification data frame to standardized format for downstream analyses.
prepare.fastq.specification(sample.details, sample.id.column = 1, fastq.columns = c(2, 3), patient.id.column = NA, tissue.column = NA)
prepare.fastq.specification(sample.details, sample.id.column = 1, fastq.columns = c(2, 3), patient.id.column = NA, tissue.column = NA)
sample.details |
Data frame where each row represents a sample to be run. Must contain sample ID, path to tumour BAM, and path to normal BAM. |
sample.id.column |
Index or string giving column of |
fastq.columns |
Index or string giving column(s) of |
patient.id.column |
Index or string giving column of |
tissue.column |
Index or string giving column of |
This function prepares a data frame that can be used to run alignment. For paired-end reads, this data frame will contain three columns with names: sample.id, reads, mates For single-end reads, the data frame will contain two columns with names: sample.id, reads
Data frame with one row per sample to be run
Process a MiniSeq directory and sample sheet to get specification data frames that can be used to run the VariTAS pipeline.
Note: This assumes normal samples are not available.
prepare.miniseq.specifications(sample.sheet, miniseq.directory)
prepare.miniseq.specifications(sample.sheet, miniseq.directory)
sample.sheet |
Data frame containing sample information, or path to a MiniSeq sample sheet |
miniseq.directory |
Path to directory with MiniSeq files |
A list with specification data frames 'fastq', 'bam', and 'vcf' (as applicable)
miniseq.sheet <- file.path(path.package('varitas'), 'extdata/miniseq/Example_template.csv') miniseq.directory <- file.path(path.package('varitas'), 'extdata/miniseq') miniseq.info <- prepare.miniseq.specifications(miniseq.sheet, miniseq.directory)
miniseq.sheet <- file.path(path.package('varitas'), 'extdata/miniseq/Example_template.csv') miniseq.directory <- file.path(path.package('varitas'), 'extdata/miniseq') miniseq.info <- prepare.miniseq.specifications(miniseq.sheet, miniseq.directory)
Prepare VCF specification data frame for annotation
prepare.vcf.specification(vcf.details, sample.id.column = 1, vcf.column = 2, job.dependency.column = NA, caller.column = NA)
prepare.vcf.specification(vcf.details, sample.id.column = 1, vcf.column = 2, job.dependency.column = NA, caller.column = NA)
vcf.details |
Data frame containing details of VCF files |
sample.id.column |
Identifier of column in |
vcf.column |
Identifier of column in |
job.dependency.column |
Identifier of column in |
caller.column |
Identifier of column in |
Properly formatted VCF details
Process the coverage reports generated by bedtools coverage tool.
process.coverage.reports(project.directory)
process.coverage.reports(project.directory)
project.directory |
Path to project directory. Each sample should have its own subdirectory |
final.statistics data frame of coverage statistics generated by parsing through coverage reports
Takes *selfSM reports generated by VerifyBamID during alignment, and returns a vector of freemix scores. The freemix score is a sequence only estimate of sample contamination that ranges from 0 to 1.
Note: Targeted panels are often too small for this step to work properly.
process.sample.contamination.checks(project.directory)
process.sample.contamination.checks(project.directory)
project.directory |
Path to project directory. Each sample should have its own subdirectory |
freemix.scores Data frame giving sample contamination (column freemix) score per sample.
https://genome.sph.umich.edu/wiki/VerifyBamID
Process reports generated by flagstat. Assumes reports for before and after off-target filtering have been written to the same file, with separating headers
process.total.coverage.statistics(project.directory)
process.total.coverage.statistics(project.directory)
project.directory |
Path to project directory. Each sample should have its own subdirectory |
data frame with extracted statistics
Read all calls made with a certain caller
read.all.calls(sample.ids, caller = c("vardict", "mutect", "pgm"), project.directory, patient.ids = NULL, apply.filters = TRUE, variant.file.pattern = NULL)
read.all.calls(sample.ids, caller = c("vardict", "mutect", "pgm"), project.directory, patient.ids = NULL, apply.filters = TRUE, variant.file.pattern = NULL)
sample.ids |
Vector giving sample IDs to process |
caller |
String indicating which caller was used |
project.directory |
Path to project directory |
patient.ids |
Optional vector giving patient ID (or other group) corresponding to each sample |
apply.filters |
Logical indicating whether filters specified in VariTAS options should be applied. Defaults to TRUE. ! |
variant.file.pattern |
Pattern indicating where the variant file can be found. Sample ID should be indicated by SAMPLE_ID |
combined.variant.calls Data frame with variant calls from all patients
Read output from iDES_step1.pl and return data frame
read.ides.file(filename)
read.ides.file(filename)
filename |
path to file |
ides.data data frame read from iDES output
Read variant calls from file and format for ease of downstream analyses.
read.variant.calls(variant.file, variant.caller)
read.variant.calls(variant.file, variant.caller)
variant.file |
Path to variant file. |
variant.caller |
String indicating which variant caller was used. Needed to format the headers. |
variant.calls Data frame of variant calls
Read a yaml file
read.yaml(file.name)
read.yaml(file.name)
file.name |
Path to yaml file |
list containing contents of yaml file
read.yaml(file.path(path.package('varitas'), 'config.yaml'))
read.yaml(file.path(path.package('varitas'), 'config.yaml'))
Run alignment
run.alignment(fastq.specification, output.directory, paired.end = FALSE, sample.directories = TRUE, output.subdirectory = FALSE, job.name.prefix = NULL, job.group = "alignment", quiet = FALSE, verify.options = !quiet)
run.alignment(fastq.specification, output.directory, paired.end = FALSE, sample.directories = TRUE, output.subdirectory = FALSE, job.name.prefix = NULL, job.group = "alignment", quiet = FALSE, verify.options = !quiet)
fastq.specification |
Data frame detailing FASTQ files to be processed, typically from prepare.fastq.specification |
output.directory |
Path to project directory |
paired.end |
Logical indicating whether paired-end sequencing was performed |
sample.directories |
Logical indicating whether all sample files should be saved to sample-specific subdirectories (will be created) |
output.subdirectory |
If further nesting is required, name of subdirectory. If no further nesting, set to FALSE |
job.name.prefix |
Prefix for job names on the cluster |
job.group |
Group job should be associated with on cluster |
quiet |
Logical indicating whether to print commands to screen rather than submit them |
verify.options |
Logical indicating whether to run verify.varitas.options |
Runs alignment (and related processing steps) on each sample.
None
run.alignment( fastq.specification = data.frame( sample.id = c('1', '2'), reads = c('1-R1.fastq.gz', '2-R1.fastq.gz'), mates = c('1-R2.fastq.gz', '2-R2.fastq.gz'), patient.id = c('P1', 'P1'), tissue = c('tumour', 'normal') ), output.directory = '.', quiet = TRUE, paired.end = TRUE )
run.alignment( fastq.specification = data.frame( sample.id = c('1', '2'), reads = c('1-R1.fastq.gz', '2-R1.fastq.gz'), mates = c('1-R2.fastq.gz', '2-R2.fastq.gz'), patient.id = c('P1', 'P1'), tissue = c('tumour', 'normal') ), output.directory = '.', quiet = TRUE, paired.end = TRUE )
Run alignment for a single sample
run.alignment.sample(fastq.files, sample.id, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, job.name = NULL, job.group = NULL, quiet = FALSE, verify.options = !quiet)
run.alignment.sample(fastq.files, sample.id, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, job.name = NULL, job.group = NULL, quiet = FALSE, verify.options = !quiet)
fastq.files |
Paths to FASTQ files (one file if single-end reads, two files if paired-end) |
sample.id |
Sample ID for labelling |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
job.name |
Name of job to be submitted |
job.group |
Group job should belong to |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
verify.options |
Logical indicating whether to run verify.varitas.options |
Run all the scripts generated by previous parts of the pipeline, without using HPC commands
run.all.scripts(output.directory, stages.to.run = c("alignment", "qc", "calling", "annotation", "merging"), variant.callers = NULL, quiet = FALSE)
run.all.scripts(output.directory, stages.to.run = c("alignment", "qc", "calling", "annotation", "merging"), variant.callers = NULL, quiet = FALSE)
output.directory |
Main directory where all files should be saved |
stages.to.run |
A character vector of all stages that need running |
variant.callers |
A character vector of variant callers to run |
quiet |
Logical indicating whether to print commands to screen rather than submit jobs. Defaults to FALSE, can be useful to set to TRUE for testing. |
None
Takes a data frame with paths to VCF files, and runs ANNOVAR annotation on each file. To allow for smooth connections with downstream pipeline steps, the function returns a variant specification data frame that can be used as input to merging steps.
run.annotation(vcf.specification, output.directory = NULL, job.name.prefix = NULL, job.group = NULL, quiet = FALSE, verify.options = !quiet)
run.annotation(vcf.specification, output.directory = NULL, job.name.prefix = NULL, job.group = NULL, quiet = FALSE, verify.options = !quiet)
vcf.specification |
Data frame detailing VCF files to be processed, from |
output.directory |
Path to folder where code and log files should be stored in their respective subdirectories. If not supplied, code and log files will be stored in the directory with each VCF file. |
job.name.prefix |
Prefix to be added before VCF name in job name. Defaults to 'annotate', but should be changed if running multiple callers to avoid |
job.group |
Group job should be associated with on cluster |
quiet |
Logical indicating whether to print commands to screen rather than submit them |
verify.options |
Logical indicating whether to run verify.varitas.options |
Data frame with details of variant files
run.annotation( data.frame( sample.id = c('a', 'b'), vcf = c('a.vcf', 'b.vcf'), caller = c('mutect', 'mutect') ), output.directory = '.', quiet = TRUE )
run.annotation( data.frame( sample.id = c('a', 'b'), vcf = c('a.vcf', 'b.vcf'), caller = c('mutect', 'mutect') ), output.directory = '.', quiet = TRUE )
Run ANNOVAR on a VCF file
run.annovar.vcf(vcf.file, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, job.group = NULL, job.name = NULL, isis = FALSE, quiet = FALSE, verify.options = !quiet)
run.annovar.vcf(vcf.file, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, job.group = NULL, job.name = NULL, isis = FALSE, quiet = FALSE, verify.options = !quiet)
vcf.file |
Path to VCF file |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
job.group |
Group job should belong to |
job.name |
Name of job to be submitted |
isis |
Logical indicating whether VCF files are from the isis (MiniSeq) variant caller |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
verify.options |
Logical indicating whether to run verify.varitas.options |
None
Run filtering on an ANNOVAR-annotated txt file
run.filtering.txt(variant.file, caller = c("consensus", "vardict", "ides", "mutect"), output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, job.group = NULL, quiet = FALSE)
run.filtering.txt(variant.file, caller = c("consensus", "vardict", "ides", "mutect"), output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, job.group = NULL, quiet = FALSE)
variant.file |
Path to variant file |
caller |
String giving variant caller that was used (affects which filters were applied. |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
job.group |
Group job should belong to |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
Run iDES
run.ides(project.directory, sample.id.pattern = "._S\\d+$", sample.ids = NULL, job.dependencies = NULL)
run.ides(project.directory, sample.id.pattern = "._S\\d+$", sample.ids = NULL, job.dependencies = NULL)
project.directory |
Directory containing files |
sample.id.pattern |
Regex pattern to match sample IDs |
sample.ids |
Vector of sample IDs |
job.dependencies |
Vector of job dependencies |
Run iDES step 1on each sample, to tally up calls by strand. Files are output to a the sample subdirectory
None
Deprecated function for running iDES. Follows previous development package without specification data frames
https://cappseq.stanford.edu/ides/
Run LoFreq for a sample
run.lofreq.sample(tumour.bam, sample.id, paired, normal.bam = NULL, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, quiet = FALSE, job.name = NULL, verify.options = !quiet, job.group = NULL)
run.lofreq.sample(tumour.bam, sample.id, paired, normal.bam = NULL, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, quiet = FALSE, job.name = NULL, verify.options = !quiet, job.group = NULL)
tumour.bam |
Path to tumour sample BAM file. |
sample.id |
Sample ID for labelling |
paired |
Logical indicating whether to do variant calling with a matched normal. |
normal.bam |
Path to normal BAM file if |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
job.name |
Name of job to be submitted |
verify.options |
Logical indicating whether to run verify.varitas.options |
job.group |
Group job should belong to |
Run MuSE for a sample
run.muse.sample(tumour.bam, sample.id, paired, normal.bam = NULL, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, quiet = FALSE, job.name = NULL, verify.options = !quiet, job.group = NULL)
run.muse.sample(tumour.bam, sample.id, paired, normal.bam = NULL, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, quiet = FALSE, job.name = NULL, verify.options = !quiet, job.group = NULL)
tumour.bam |
Path to tumour sample BAM file. |
sample.id |
Sample ID for labelling |
paired |
Logical indicating whether to do variant calling with a matched normal. |
normal.bam |
Path to normal BAM file if |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
job.name |
Name of job to be submitted |
verify.options |
Logical indicating whether to run verify.varitas.options |
job.group |
Group job should belong to |
Run MuTect for a sample
run.mutect.sample(tumour.bam, sample.id, paired, normal.bam = NULL, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, quiet = FALSE, job.name = NULL, verify.options = !quiet, job.group = NULL)
run.mutect.sample(tumour.bam, sample.id, paired, normal.bam = NULL, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, quiet = FALSE, job.name = NULL, verify.options = !quiet, job.group = NULL)
tumour.bam |
Path to tumour sample BAM file. |
sample.id |
Sample ID for labelling |
paired |
Logical indicating whether to do variant calling with a matched normal. |
normal.bam |
Path to normal BAM file if |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
job.name |
Name of job to be submitted |
verify.options |
Logical indicating whether to run verify.varitas.options |
job.group |
Group job should belong to |
Submit post-processing job to the cluster with appropriate job dependencies
run.post.processing(variant.specification, output.directory, code.directory = NULL, log.directory = NULL, config.file = NULL, job.name.prefix = NULL, quiet = FALSE, email = NULL, verify.options = !quiet)
run.post.processing(variant.specification, output.directory, code.directory = NULL, log.directory = NULL, config.file = NULL, job.name.prefix = NULL, quiet = FALSE, email = NULL, verify.options = !quiet)
variant.specification |
Data frame specifying files to be processed |
output.directory |
Path to directory where output should be saved |
code.directory |
Directory where code should be saved |
log.directory |
Directory where log files should be saved |
config.file |
Path to config file |
job.name.prefix |
Prefix for job names on the cluster |
quiet |
Logical indicating whether to print commands to screen rather than submit the job |
email |
Email address that should be notified when job finishes. If NULL or FALSE, no email is sent |
verify.options |
Logical indicating whether |
None
run.post.processing( variant.specification = data.frame( sample.id = c('a', 'b'), vcf = c('a.vcf', 'b.vcf'), caller = c('mutect', 'mutect'), job.dependency = c('example1', 'example2') ), output.directory = '.', quiet = TRUE )
run.post.processing( variant.specification = data.frame( sample.id = c('a', 'b'), vcf = c('a.vcf', 'b.vcf'), caller = c('mutect', 'mutect'), job.dependency = c('example1', 'example2') ), output.directory = '.', quiet = TRUE )
Perform sample QC by looking at target coverage.
run.target.qc(bam.specification, project.directory, sample.directories = TRUE, paired = FALSE, output.subdirectory = FALSE, quiet = FALSE, job.name.prefix = NULL, verify.options = FALSE, job.group = "target_qc")
run.target.qc(bam.specification, project.directory, sample.directories = TRUE, paired = FALSE, output.subdirectory = FALSE, quiet = FALSE, job.name.prefix = NULL, verify.options = FALSE, job.group = "target_qc")
bam.specification |
Data frame containing details of BAM files to be processed, typically from |
project.directory |
Path to project directory where code and log files should be saved |
sample.directories |
Logical indicating whether output for each sample should be put in its own directory (within output.directory) |
paired |
Logical indicating whether the analysis is paired. This does not affect QC directly, but means normal samples get nested |
output.subdirectory |
If further nesting is required, name of subdirectory. If no further nesting, set to FALSE |
quiet |
Logical indicating whether to print commands to screen rather than submit the job |
job.name.prefix |
Prefix for job names on the cluster |
verify.options |
Logical indicating whether to run verify.varitas.options |
job.group |
Group job should be associated with on cluster |
Get ontarget reads and run coverage quality control
run.target.qc.sample(bam.file, sample.id, output.directory = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, job.name = NULL, job.group = NULL, quiet = FALSE)
run.target.qc.sample(bam.file, sample.id, output.directory = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, job.name = NULL, job.group = NULL, quiet = FALSE)
bam.file |
Path to BAM file |
sample.id |
Sample ID for labelling |
output.directory |
Path to output directory |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
job.name |
Name of job to be submitted |
job.group |
Group job should belong to |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
Run VarDict on a sample. Idea: have a low-level function that simply submits job to Perl, after BAM paths have been found. and output paths already have been decided upon
run.vardict.sample(tumour.bam, sample.id, paired, proton = FALSE, normal.bam = NULL, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, job.name = NULL, job.group = NULL, quiet = FALSE, verify.options = !quiet)
run.vardict.sample(tumour.bam, sample.id, paired, proton = FALSE, normal.bam = NULL, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, job.name = NULL, job.group = NULL, quiet = FALSE, verify.options = !quiet)
tumour.bam |
Path to tumour sample BAM file. |
sample.id |
Sample ID for labelling |
paired |
Logical indicating whether to do variant calling with a matched normal. |
proton |
Logical indicating whether the data was generated by proton sequencing. Defaults to False (i.e. Illumina) |
normal.bam |
Path to normal BAM file if |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
job.name |
Name of job to be submitted |
job.group |
Group job should belong to |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
verify.options |
Logical indicating whether to run verify.varitas.options |
Run variant calling for all samples
run.variant.calling(bam.specification, output.directory, variant.callers = c("vardict", "mutect", "varscan", "lofreq", "muse"), paired = TRUE, proton = FALSE, sample.directories = TRUE, job.name.prefix = NULL, quiet = FALSE, verify.options = !quiet)
run.variant.calling(bam.specification, output.directory, variant.callers = c("vardict", "mutect", "varscan", "lofreq", "muse"), paired = TRUE, proton = FALSE, sample.directories = TRUE, job.name.prefix = NULL, quiet = FALSE, verify.options = !quiet)
bam.specification |
Data frame containing details of BAM files to be processed, typically from |
output.directory |
Path to directory where output should be saved |
variant.callers |
Character vector of variant callers to be used |
paired |
Logical indicating whether to do variant calling with a matched normal |
proton |
Logical indicating whether data was generated by proton sequencing (ignored if running MuTect) |
sample.directories |
Logical indicating whether output for each sample should be put in its own directory (within output.directory) |
job.name.prefix |
Prefix for job names on the cluster |
quiet |
Logical indicating whether to print commands to screen rather than submit the job |
verify.options |
Logical indicating whether to run verify.varitas.options |
Run VarDict on each sample, and annotate the results with ANNOVAR. Files are output to a vardict/ subdirectory within each sample directory.
None
run.variant.calling( data.frame(sample.id = c('Z', 'Y'), tumour.bam = c('Z.bam', 'Y.bam')), output.directory = '.', variant.caller = c('lofreq', 'mutect'), quiet = TRUE, paired = FALSE )
run.variant.calling( data.frame(sample.id = c('Z', 'Y'), tumour.bam = c('Z.bam', 'Y.bam')), output.directory = '.', variant.caller = c('lofreq', 'mutect'), quiet = TRUE, paired = FALSE )
Run all steps in VariTAS processing pipeline, with appropriate dependencies.
run.varitas.pipeline(file.details, output.directory, run.name = NULL, start.stage = c("alignment", "qc", "calling", "annotation", "merging"), variant.callers = NULL, proton = FALSE, quiet = FALSE, email = NULL, verify.options = !quiet, save.specification.files = !quiet)
run.varitas.pipeline(file.details, output.directory, run.name = NULL, start.stage = c("alignment", "qc", "calling", "annotation", "merging"), variant.callers = NULL, proton = FALSE, quiet = FALSE, email = NULL, verify.options = !quiet, save.specification.files = !quiet)
file.details |
Data frame containing details of files to be used during first processing step. Depending on what you want to be the first step in the pipeline, this can either be FASTQ files, BAM files, VCF files, or variant (txt) files. |
output.directory |
Main directory where all files should be saved |
run.name |
Name of pipeline run. Will be added as a prefix to all LSF jobs. |
start.stage |
String indicating which stage pipeline should start at. If starting at a later stage of the pipeline, appropriate input files must be provided. For example, if starting with annotation, VCF files with variant calls must be provided. |
variant.callers |
Vector specifying which variant callers should be run. |
proton |
Logical indicating if data was generated by proton sequencing. Used to set base quality thresholds in variant calling steps. |
quiet |
Logical indicating whether to print commands to screen rather than submit jobs. Defaults to FALSE, can be useful to set to TRUE for testing. |
email |
Email address that should be notified when pipeline finishes. If NULL or FALSE, no email is sent. |
verify.options |
Logical indicating whether to run verify.varitas.options |
save.specification.files |
Logical indicating if specification files should be saved to project directory |
None
run.varitas.pipeline( file.details = data.frame( sample.id = c('1', '2'), reads = c('1-R1.fastq.gz', '2-R1.fastq.gz'), mates = c('1-R2.fastq.gz', '2-R2.fastq.gz'), patient.id = c('P1', 'P1'), tissue = c('tumour', 'normal') ), output.directory = '.', quiet = TRUE, run.name = "Test", variant.callers = c('mutect', 'varscan') )
run.varitas.pipeline( file.details = data.frame( sample.id = c('1', '2'), reads = c('1-R1.fastq.gz', '2-R1.fastq.gz'), mates = c('1-R2.fastq.gz', '2-R2.fastq.gz'), patient.id = c('P1', 'P1'), tissue = c('tumour', 'normal') ), output.directory = '.', quiet = TRUE, run.name = "Test", variant.callers = c('mutect', 'varscan') )
Run VariTAS pipeline starting from both VCF files and BAM/ FASTQ files. Useful for processing data from the Ion PGM or MiniSeq where variant calling has been done on the machine, but you are interested in running more variant callers.
run.varitas.pipeline.hybrid(vcf.specification, output.directory, run.name = NULL, fastq.specification = NULL, bam.specification = NULL, variant.callers = c("mutect", "vardict", "varscan", "lofreq", "muse"), proton = FALSE, quiet = FALSE, email = NULL, verify.options = !quiet, save.specification.files = !quiet)
run.varitas.pipeline.hybrid(vcf.specification, output.directory, run.name = NULL, fastq.specification = NULL, bam.specification = NULL, variant.callers = c("mutect", "vardict", "varscan", "lofreq", "muse"), proton = FALSE, quiet = FALSE, email = NULL, verify.options = !quiet, save.specification.files = !quiet)
vcf.specification |
Data frame containing details of vcf files to be processed. Must contain columns sample.id, vcf, and caller |
output.directory |
Main directory where all files should be saved |
run.name |
Name of pipeline run. Will be added as a prefix to all LSF jobs. |
fastq.specification |
Data frame containing details of FASTQ files to be processed |
bam.specification |
Data frame containing details of BAM files to be processed |
variant.callers |
Vector specifying which variant callers should be run. |
proton |
Logical indicating if data was generated by proton sequencing. Used to set base quality thresholds in variant calling steps. |
quiet |
Logical indicating whether to print commands to screen rather than submit jobs. Defaults to FALSE, can be useful to set to TRUE for testing. |
email |
Email address that should be notified when pipeline finishes. If NULL or FALSE, no email is sent. |
verify.options |
Logical indicating whether to run verify.varitas.options |
save.specification.files |
Logical indicating if specification files should be saved to project directory |
None
run.varitas.pipeline.hybrid( bam.specification = data.frame(sample.id = c('Z', 'Y'), tumour.bam = c('Z.bam', 'Y.bam')), vcf.specification = data.frame( sample.id = c('a', 'b'), vcf = c('a.vcf', 'b.vcf'), caller = c('pgm', 'pgm') ), output.directory = '.', quiet = TRUE, run.name = "Test", variant.callers = c('mutect', 'varscan') )
run.varitas.pipeline.hybrid( bam.specification = data.frame(sample.id = c('Z', 'Y'), tumour.bam = c('Z.bam', 'Y.bam')), vcf.specification = data.frame( sample.id = c('a', 'b'), vcf = c('a.vcf', 'b.vcf'), caller = c('pgm', 'pgm') ), output.directory = '.', quiet = TRUE, run.name = "Test", variant.callers = c('mutect', 'varscan') )
Run VarScan for a sample
run.varscan.sample(tumour.bam, sample.id, paired, normal.bam = NULL, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, quiet = FALSE, job.name = NULL, verify.options = !quiet, job.group = NULL)
run.varscan.sample(tumour.bam, sample.id, paired, normal.bam = NULL, output.directory = NULL, output.filename = NULL, code.directory = NULL, log.directory = NULL, config.file = NULL, job.dependencies = NULL, quiet = FALSE, job.name = NULL, verify.options = !quiet, job.group = NULL)
tumour.bam |
Path to tumour sample BAM file. |
sample.id |
Sample ID for labelling |
paired |
Logical indicating whether to do variant calling with a matched normal. |
normal.bam |
Path to normal BAM file if |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
job.name |
Name of job to be submitted |
verify.options |
Logical indicating whether to run verify.varitas.options |
job.group |
Group job should belong to |
Save current varitas config options to a temporary file, and return filename.
save.config(output.file = NULL)
save.config(output.file = NULL)
output.file |
Path to output file. If NULL (default), the config file will be saved as a temporary file. |
Path to config file
Save coverage statistics to multi-worksheet Excel file.
save.coverage.excel(project.directory, file.name, overwrite = TRUE)
save.coverage.excel(project.directory, file.name, overwrite = TRUE)
project.directory |
Path to project directory |
file.name |
Name of output file |
overwrite |
Logical indicating whether to overwrite existing file if it exists. |
None
Makes an Excel workbook with variant calls. If filters are provided, these will be saved to an additional worksheet within the same file.
save.variants.excel(variants, file.name, filters = NULL, overwrite = TRUE)
save.variants.excel(variants, file.name, filters = NULL, overwrite = TRUE)
variants |
Data frame containing variants |
file.name |
Name of output file |
filters |
Optional list of filters to be saved |
overwrite |
Logical indicating whether to overwrite exiting file if it exists. Defaults to TRUE for consistency with other R functions. |
Set or overwrite options for the VariTAS pipeline. Nested options should be separated by a dot. For example, to update the reference genome for grch38, use reference_genome.grch38
set.varitas.options(...)
set.varitas.options(...)
... |
options to set |
None
## Not run: set.varitas.options(reference_build = 'grch38'); set.varitas.options( filters.mutect.min_normal_depth = 10, filters.vardict.min_normal_depth = 10 ); ## End(Not run)
## Not run: set.varitas.options(reference_build = 'grch38'); set.varitas.options( filters.mutect.min_normal_depth = 10, filters.vardict.min_normal_depth = 10 ); ## End(Not run)
Split data frame on a concatenated column.
## S3 method for class 'on.column' split(dat, column, split.character)
## S3 method for class 'on.column' split(dat, column, split.character)
dat |
Data frame to be processed |
column |
Name of column to split on |
split.character |
Pattern giving character to split column on |
Data frame after splitting on column
Simply calculates the depth of coverage of the variant allele given a string of DP4 values
## S3 method for class 'dp4' sum(dp4.str)
## S3 method for class 'dp4' sum(dp4.str)
dp4.str |
String of DP4 values in the form "1234,1234,1234,1234" |
Runs ls command on system. This is a workaround since list.files can not match patterns based on subdirectory structure.
system.ls(pattern = "", directory = "", error = FALSE)
system.ls(pattern = "", directory = "", error = FALSE)
pattern |
pattern to match files |
directory |
base directory command should be run from |
error |
logical indicating whether to throw an error if no matching founds found. Defaults to False. |
paths returned by ls command
Calculate the mean of data in tabular format
tabular.mean(values, frequencies, ...)
tabular.mean(values, frequencies, ...)
values |
vector of values |
frequencies |
frequency corresponding to each value |
... |
Additional parameters passed to |
calculated mean
Calculate the median of data in tabular format
tabular.median(values, frequencies, ...)
tabular.median(values, frequencies, ...)
values |
Vector of values |
frequencies |
Frequency corresponding to each value |
... |
Additional parameters passed to |
calculated median
Make barplot of trinucleotide substitutions
trinucleotide.barplot(variants, file.name)
trinucleotide.barplot(variants, file.name)
variants |
Data frame with variants |
file.name |
Name of output file |
None
Make barplot of variants per caller
variant.recurrence.barplot(variants, file.name)
variant.recurrence.barplot(variants, file.name)
variants |
Data frame with variants |
file.name |
Name of output file |
None
Make barplot of variants per caller
variants.caller.barplot(variants, file.name, group.by = NULL)
variants.caller.barplot(variants, file.name, group.by = NULL)
variants |
Data frame with variants |
file.name |
Name of output file |
group.by |
Optional grouping variable for barplot |
None
Make barplot of variants per sample
variants.sample.barplot(variants, file.name)
variants.sample.barplot(variants, file.name)
variants |
Data frame with variants |
file.name |
Name of output file |
None
Check that sample specification data frame matches expected format, and that all files exist
verify.bam.specification(bam.specification)
verify.bam.specification(bam.specification)
bam.specification |
Data frame containing columns sample.id and tumour.bam, and optionally a column normal.bam. |
None
Verify that bwa index files exist for a fasta file
verify.bwa.index(fasta.file, error = FALSE)
verify.bwa.index(fasta.file, error = FALSE)
fasta.file |
Fasta file to check |
error |
Logical indicating whether to throw an (informative) error if verification fails |
index.files.exist Logical indicating if bwa index files were found (only returned if error set to FALSE)
Verify that fasta index files exist for a given fasta file.
verify.fasta.index(fasta.file, error = FALSE)
verify.fasta.index(fasta.file, error = FALSE)
fasta.file |
Fasta file to check |
error |
Logical indicating whether to throw an (informative) error if verification fails |
faidx.exists Logical indicating if fasta index files were found (only returned if error set to FALSE)
Check that FASTQ specification data frame matches expected format, and that all files exist
verify.fastq.specification(fastq.specification, paired.end = FALSE, files.ready = FALSE)
verify.fastq.specification(fastq.specification, paired.end = FALSE, files.ready = FALSE)
fastq.specification |
Data frame containing columns sample.id and reads, and optionally a column mates |
paired.end |
Logical indicating whether paired end reads are used |
files.ready |
Logical indicating if the files already exist on disk. If there are job dependencies, this should be set to FALSE. |
None
Verify that sequence dictionary exists for a fasta file.
verify.sequence.dictionary(fasta.file, error = FALSE)
verify.sequence.dictionary(fasta.file, error = FALSE)
fasta.file |
Fasta file to check |
error |
Logical indicating whether to throw an (informative) error if verification fails |
dict.exists Logical indicating if sequence dictionary files were found (only returned if error set to FALSE)
Check against common errors in the VariTAS options before launching into pipeline
verify.varitas.options(stages.to.run = c("alignment", "qc", "calling", "annotation", "merging"), variant.callers = c("mutect", "vardict", "ides", "varscan", "lofreq", "muse"), varitas.options = NULL)
verify.varitas.options(stages.to.run = c("alignment", "qc", "calling", "annotation", "merging"), variant.callers = c("mutect", "vardict", "ides", "varscan", "lofreq", "muse"), varitas.options = NULL)
stages.to.run |
Vector indicating which stages should be run. Defaults to all possible stages. If only running a subset of stages, only checks corresponding to the desired stages are run |
variant.callers |
Vector indicating which variant callers to run. Only used if calling is in |
varitas.options |
Optional file path or list of VariTAS options. |
None
Verify that VCF specification data frame fits expected format
verify.vcf.specification(vcf.specification)
verify.vcf.specification(vcf.specification)
vcf.specification |
VCF specification data frame |
None