Package 'seqminer' reference manual

Title:	Efficiently Read Sequence Data (VCF Format, BCF Format, METAL Format and BGEN Format) into R
Description:	Integrate sequencing data (Variant call format, e.g. VCF or BCF) or meta-analysis results in R. This package can help you (1) read VCF/BCF/BGEN files by chromosomal ranges (e.g. 1:100-200); (2) read RareMETAL summary statistics files; (3) read tables from a tabix-indexed files; (4) annotate VCF/BCF files; (5) create customized workflow based on Makefile.
Authors:	Xiaowei Zhan [aut, cre], Dajiang Liu [aut], Attractive Chaos [cph] (We have used the following software and made minimal necessary changes: Tabix, Heng Li <[email protected]> (MIT license). We removed standard IO related functions, e.g. printf, fprintf ; also changed its un-safe pointer arithmetics.), Broad Institute / Massachusetts Institute of Technology [cph], Genome Research Ltd (GRL) [cph], Facebook, Inc [cph], D. Richard Hipp [cph]
Maintainer:	Xiaowei Zhan <[email protected]>
License:	GPL \| file LICENSE
Version:	9.7
Built:	2025-01-31 06:52:45 UTC
Source:	CRAN

Read from binary PLINK file and return a genotype matrix

Description

Read from binary PLINK file and return a genotype matrix

Usage

## S3 method for class 'PlinkFile'
plinkFileObject[sampleIndex, markerIndex]
## S3 method for class 'PlinkFile'
plinkFileObject[sampleIndex, markerIndex]

Arguments

`plinkFileObject`	a PlinkFileObject obtained by openPlink()
`sampleIndex`	integer, 1-basd, index of samples to be extracted
`markerIndex`	integer, 1-basd, index of markers to be extracted

Value

genotype matrix, marker by sample

Examples

## these indice are nonsynonymous markers for 1:196621007-196716634",
## refer to the readVCFToMatrixByRange()
fileName = system.file("plink/all.anno.filtered.extract.bed", package = "seqminer")
filePrefix = sub(fileName, pattern = ".bed", replacement = "")
plinkObj = openPlink(filePrefix)
sampleIndex = seq(3)
markerIndex =c(14, 36)
cfh <- plinkObj[sampleIndex, markerIndex]
## these indice are nonsynonymous markers for 1:196621007-196716634",
## refer to the readVCFToMatrixByRange()
fileName = system.file("plink/all.anno.filtered.extract.bed", package = "seqminer")
filePrefix = sub(fileName, pattern = ".bed", replacement = "")
plinkObj = openPlink(filePrefix)
sampleIndex = seq(3)
markerIndex =c(14, 36)
cfh <- plinkObj[sampleIndex, markerIndex]

Add a job to a workflow

Description

Add a job to a workflow

Usage

addJob(wf, job)
addJob(wf, job)

Arguments

`wf`	a variable of workflow class
`job`	a variable of job class

Examples

j1 <- newJob('id1', 'cmd out1', 'out1')
j2 <- newJob('id2', 'cmd out2', 'out2', depend = 'out1')
w <- newWorkflow("wf")
w <- addJob(w, j1)
w <- addJob(w, j2)

outFile <- file.path(tempdir(), "Makefile")
writeWorkflow(w, outFile)
cat('Outputted Makefile file are in the temp directory:', outFile, '\n')
j1 <- newJob('id1', 'cmd out1', 'out1')
j2 <- newJob('id2', 'cmd out2', 'out2', depend = 'out1')
w <- newWorkflow("wf")
w <- addJob(w, j1)
w <- addJob(w, j2)

outFile <- file.path(tempdir(), "Makefile")
writeWorkflow(w, outFile)
cat('Outputted Makefile file are in the temp directory:', outFile, '\n')

Annotate a test variant

Description

Annotate a test variant

Usage

annotateGene(param, chrom, position, ref, alt)
annotateGene(param, chrom, position, ref, alt)

Arguments

`param`	a list of annotation configuaration (e.g. reference file, gene definition)
`chrom`	a vector of chromosome names
`position`	a vector of chromosome positions
`ref`	a vector of reference alleles
`alt`	a vector of alternative alleles

Value

annotated results in a data frame structure

Examples

if (.Platform$endian == "little") {
  param <- list(reference = system.file("tabanno/test.fa", package = "seqminer"),
                geneFile = system.file("tabanno/test.gene.txt", package = "seqminer"))
  param <- makeAnnotationParameter(param)
  print(param)
  annotateGene(param, c("1", "1"), c(3, 5) , c("A", "C"), c("G", "C"))
} else {
  message("Tabix does not work well for big endian for now")
}
if (.Platform$endian == "little") {
  param <- list(reference = system.file("tabanno/test.fa", package = "seqminer"),
                geneFile = system.file("tabanno/test.gene.txt", package = "seqminer"))
  param <- makeAnnotationParameter(param)
  print(param)
  annotateGene(param, c("1", "1"), c(3, 5) , c("A", "C"), c("G", "C"))
} else {
  message("Tabix does not work well for big endian for now")
}

Annotate a plain text file

Description

Annotate a plain text file

Usage

annotatePlain(inFile, outFile, params)
annotatePlain(inFile, outFile, params)

Arguments

`inFile`	input file name
`outFile`	output file name
`params`	parameters

Value

0 if succeed

Examples

param <- list(reference = system.file("tabanno/test.fa", package = "seqminer"),
              geneFile = system.file("tabanno/test.gene.txt", package = "seqminer"),
              inputFormat = "plain")
param <- makeAnnotationParameter(param)
inFile <- system.file("tabanno/input.test.plain.txt", package = "seqminer")
outFile <- file.path(tempdir(), "out.annotated.txt")
annotatePlain(inFile, outFile, param)
cat('Outputted annotation results are in the temp directory:', outFile, '\n')
param <- list(reference = system.file("tabanno/test.fa", package = "seqminer"),
              geneFile = system.file("tabanno/test.gene.txt", package = "seqminer"),
              inputFormat = "plain")
param <- makeAnnotationParameter(param)
inFile <- system.file("tabanno/input.test.plain.txt", package = "seqminer")
outFile <- file.path(tempdir(), "out.annotated.txt")
annotatePlain(inFile, outFile, param)
cat('Outputted annotation results are in the temp directory:', outFile, '\n')

Annotate a VCF file

Description

Annotate a VCF file

Usage

annotateVcf(inVcf, outVcf, params)
annotateVcf(inVcf, outVcf, params)

Arguments

`inVcf`	input VCF file name
`outVcf`	output VCF file name
`params`	parameters

Value

0 if succeed

Examples

param <- list(reference = system.file("tabanno/test.fa", package = "seqminer"),
              geneFile = system.file("tabanno/test.gene.txt", package = "seqminer"))
param <- makeAnnotationParameter(param)
inVcf <- system.file("tabanno/input.test.vcf", package = "seqminer")
outVcf <- file.path(tempdir(), "/", "out.vcf")
annotateVcf (inVcf, outVcf, param)
cat('Annotated VCF files are in the temp directory:', outVcf, '\n')
param <- list(reference = system.file("tabanno/test.fa", package = "seqminer"),
              geneFile = system.file("tabanno/test.gene.txt", package = "seqminer"))
param <- makeAnnotationParameter(param)
inVcf <- system.file("tabanno/input.test.vcf", package = "seqminer")
outVcf <- file.path(tempdir(), "/", "out.vcf")
annotateVcf (inVcf, outVcf, param)
cat('Annotated VCF files are in the temp directory:', outVcf, '\n')

Create a single chromosome index

Description

Create a single chromosome index

Usage

createSingleChromosomeBCFIndex(fileName, indexFileName = NULL)
createSingleChromosomeBCFIndex(fileName, indexFileName = NULL)

Arguments

`fileName`	character, represents an input BCF file (Bgzipped, with Tabix index)
`indexFileName`	character, by default, create 'fileName'.scIdx

Value

indexFileName if success, or NULL is failed

Examples

fileName = system.file("vcf/all.anno.filtered.extract.headerFixed.bcf.gz", package = "seqminer")
cfh <- createSingleChromosomeBCFIndex(fileName)
fileName = system.file("vcf/all.anno.filtered.extract.headerFixed.bcf.gz", package = "seqminer")
cfh <- createSingleChromosomeBCFIndex(fileName)

Create a single chromosome index

Description

Create a single chromosome index

Usage

createSingleChromosomeVCFIndex(fileName, indexFileName = NULL)
createSingleChromosomeVCFIndex(fileName, indexFileName = NULL)

Arguments

`fileName`	character, represents an input VCF file (Bgzipped, with Tabix index)
`indexFileName`	character, by default, create 'fileName'.scIdx

Value

indexFileName if success, or NULL is failed

Examples

fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
cfh <- createSingleChromosomeVCFIndex(fileName)
fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
cfh <- createSingleChromosomeVCFIndex(fileName)

Download annotation resources to a directory

Description

Download annotation resources to a directory

Usage

download.annotation.resource(outputDirectory)
download.annotation.resource(outputDirectory)

Arguments

outputDirectory

the directory to store annotation resources

Value

will not return anything

Examples

## Not run: 
download.annotation.resource("/tmp")

## End(Not run)
## Not run: 
download.annotation.resource("/tmp")

## End(Not run)

Extract pair of positions by ranges

Description

Extract pair of positions by ranges

Usage

getCovPair(covData, rangeList1, rangeList2)
getCovPair(covData, rangeList1, rangeList2)

Arguments

`covData`	a covariance matrix with positions as dimnames
`rangeList1`	character specify a range, 1-based index
`rangeList2`	character specify a range, 1-based index

Value

a covariance matrix covFileName = system.file("rvtests/rvtest.MetaCov.assoc.gz", package = "seqminer") cfh <- rvmeta.readCovByRange(covFileName, "1:196621007-196716634") rangeList1 <- "1:196621007-196700000" rangeList2 <- "1:196700000-196716634" getCovPair(cfh, rangeList1, rangeList2)

Annotate a test variant

Description

Annotate a test variant

Usage

getRefBase(reference, chrom, position, len = NULL)
getRefBase(reference, chrom, position, len = NULL)

Arguments

`reference`	path to the reference genome file (.fa file)
`chrom`	a vector of chromosome names
`position`	a vector of chromosome positions
`len`	a vector of length

Value

based extracted from the reference genome

Test whether directory is writable

Description

Test whether directory is writable

Usage

isDirWritable(outDir)
isDirWritable(outDir)

Arguments

outDir

the name of the directory

Value

TRUE if the file is writable isDirWritable("~")

Test whether a vector of positions are inside given ranges

Description

Test whether a vector of positions are inside given ranges

Usage

isInRange(positions, rangeList)
isInRange(positions, rangeList)

Arguments

`positions`	characters, positions. e.g. c("1:2-3", "1:4")
`rangeList`	character, ranges, e.g. "1:1-3,1:2-4", 1-based index

Value

logical vector, TRUE/FALSE/NA

Examples

positions <- c("1:2-3", "1:4", "XX")
ranges <- "1:1-3,1:2-4,1:5-10"
isInRange(positions, ranges)
positions <- c("1:2-3", "1:4", "XX")
ranges <- "1:1-3,1:2-4,1:5-10"
isInRange(positions, ranges)

Check if the inputs are valid tabix range such as chr1:2-300

Description

Check if the inputs are valid tabix range such as chr1:2-300

Usage

isTabixRange(range)
isTabixRange(range)

Arguments

range

characer vector

Examples

valid <- isTabixRange(c("chr1:1-200", "X:1", "1:100-100", "chr1", "1:1-20,1:30-40"))
stopifnot(all(valid))
invalid <- isTabixRange(c(":1", "chr1::", ":-"))
stopifnot(all(!invalid))
valid <- isTabixRange(c("chr1:1-200", "X:1", "1:100-100", "chr1", "1:1-20,1:30-40"))
stopifnot(all(valid))
invalid <- isTabixRange(c(":1", "chr1::", ":-"))
stopifnot(all(!invalid))

Construct a usable set of annotation parameters

Description

Construct a usable set of annotation parameters

Usage

makeAnnotationParameter(param = NULL)
makeAnnotationParameter(param = NULL)

Arguments

param

a list of annotation elements

Value

list, a complete list of supported parameters

Create a new job

Description

Create a new job

Usage

newJob(id, cmd, outFile, depend = NULL)
newJob(id, cmd, outFile, depend = NULL)

Arguments

`id`	character, job ids.
`cmd`	character, commands to run
`outFile`	character, the output file names after command are run successfully
`depend`	character vector, specify the prerequisite files (e.g. outFile from other jobs)

Examples

j1 <- newJob('id1', 'cmd out1', 'out1')
j2 <- newJob('id2', 'cmd out2', 'out2', depend = 'out1')
j1 <- newJob('id1', 'cmd out1', 'out1')
j2 <- newJob('id2', 'cmd out2', 'out2', depend = 'out1')

Create a new workflow

Description

Create a new workflow

Usage

newWorkflow(name)
newWorkflow(name)

Arguments

name

character, specify the name of the workflow

Examples

w <- newWorkflow("wf")
w <- newWorkflow("wf")

Open binary PLINK files

Description

Open binary PLINK files

Usage

openPlink(fileName)
openPlink(fileName)

Arguments

fileName

character, represents the prefix of PLINK input file

Value

an PLINK file object with class name ("PlinkFile")

Examples

fileName = system.file("plink/all.anno.filtered.extract.bed", package = "seqminer")
fileName = sub(fileName, pattern = ".bed", replacement = "")
plinkObj <- openPlink(fileName)
str(plinkObj)
fileName = system.file("plink/all.anno.filtered.extract.bed", package = "seqminer")
fileName = sub(fileName, pattern = ".bed", replacement = "")
plinkObj <- openPlink(fileName)
str(plinkObj)

Read information from BGEN file in a given range and return a list

Description

Read information from BGEN file in a given range and return a list

Usage

readBGENToListByGene(fileName, geneFile, geneName)
readBGENToListByGene(fileName, geneFile, geneName)

Arguments

`fileName`	character, represents an input BGEN file (Bgzipped, with Tabix index)
`geneFile`	character, a text file listing all genes in refFlat format
`geneName`	character vector, which gene(s) to be extracted

Value

a list of chrom, pos, varid, rsid, alleles, isPhased, probability, sampleId

Examples

fileName = system.file("bgen/all.anno.filtered.extract.bgen", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- readBGENToListByGene(fileName, geneFile, "CFH")
fileName = system.file("bgen/all.anno.filtered.extract.bgen", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- readBGENToListByGene(fileName, geneFile, "CFH")

Read information from BGEN file in a given range and return a list

Description

Read information from BGEN file in a given range and return a list

Usage

readBGENToListByRange(fileName, range)
readBGENToListByRange(fileName, range)

Arguments

`fileName`	character, represents an input BGEN file (Bgzipped, with Tabix index)
`range`	character, a text indicating which range in the BGEN file to extract. e.g. 1:100-200, 1-based index

Value

a list of chrom, pos, varid, rsid, alleles, isPhased, probability, sampleId

Examples

fileName = system.file("bgen/all.anno.filtered.extract.bgen", package = "seqminer")
cfh <- readBGENToListByRange(fileName, "1:196621007-196716634")
fileName = system.file("bgen/all.anno.filtered.extract.bgen", package = "seqminer")
cfh <- readBGENToListByRange(fileName, "1:196621007-196716634")

Read a gene from BGEN file and return a genotype matrix

Description

Read a gene from BGEN file and return a genotype matrix

Usage

readBGENToMatrixByGene(fileName, geneFile, geneName)
readBGENToMatrixByGene(fileName, geneFile, geneName)

Arguments

`fileName`	character, represents an input BGEN file (Bgzipped, with Tabix index)
`geneFile`	character, a text file listing all genes in refFlat format
`geneName`	character vector, which gene(s) to be extracted

Value

genotype matrix

Examples

fileName = system.file("bgen/all.anno.filtered.extract.bgen", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- readBGENToMatrixByGene(fileName, geneFile, "CFH")
fileName = system.file("bgen/all.anno.filtered.extract.bgen", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- readBGENToMatrixByGene(fileName, geneFile, "CFH")

Read a gene from BGEN file and return a genotype matrix

Description

Read a gene from BGEN file and return a genotype matrix

Usage

readBGENToMatrixByRange(fileName, range)
readBGENToMatrixByRange(fileName, range)

Arguments

`fileName`	character, represents an input BGEN file (Bgzipped, with Tabix index)
`range`	character, a text indicating which range in the BGEN file to extract. e.g. 1:100-200, 1-based index

Value

genotype matrix

Examples

fileName = system.file("bgen/all.anno.filtered.extract.bgen", package = "seqminer")
cfh <- readBGENToMatrixByRange(fileName, "1:196621007-196716634")
fileName = system.file("bgen/all.anno.filtered.extract.bgen", package = "seqminer")
cfh <- readBGENToMatrixByRange(fileName, "1:196621007-196716634")

Read from binary PLINK file and return a genotype matrix

Description

Read from binary PLINK file and return a genotype matrix

Usage

readPlinkToMatrixByIndex(plinkFilePrefix, sampleIndex, markerIndex)
readPlinkToMatrixByIndex(plinkFilePrefix, sampleIndex, markerIndex)

Arguments

`plinkFilePrefix`	a PlinkFileObject obtained by openPlink()
`sampleIndex`	integer, 1-basd, index of samples to be extracted
`markerIndex`	integer, 1-basd, index of markers to be extracted

Value

genotype matrix, marker by sample

Examples

## these indice are nonsynonymous markers for 1:196621007-196716634",
## refer to the readVCFToMatrixByRange()
fileName = system.file("plink/all.anno.filtered.extract.bed", package = "seqminer")
fileName = sub(fileName, pattern = ".bed", replacement = "")
sampleIndex = seq(3)
markerIndex =c(14, 36)
cfh <- readPlinkToMatrixByIndex(fileName, sampleIndex, markerIndex)
## these indice are nonsynonymous markers for 1:196621007-196716634",
## refer to the readVCFToMatrixByRange()
fileName = system.file("plink/all.anno.filtered.extract.bed", package = "seqminer")
fileName = sub(fileName, pattern = ".bed", replacement = "")
sampleIndex = seq(3)
markerIndex =c(14, 36)
cfh <- readPlinkToMatrixByIndex(fileName, sampleIndex, markerIndex)

Read a range from BCF file and return a genotype matrix

Description

Read a range from BCF file and return a genotype matrix

Usage

readSingleChromosomeBCFToMatrixByRange(fileName, range, indexFileName = NULL)
readSingleChromosomeBCFToMatrixByRange(fileName, range, indexFileName = NULL)

Arguments

`fileName`	character, represents an input BCF file (Bgzipped, with Tabix index)
`range`	character, a text indicating which range in the BCF file to extract. e.g. 1:100-200, 1-based index
`indexFileName`	character, index file, by default, it s 'fileName'.scIdx

Value

genotype matrix

Examples

fileName = system.file("vcf/all.anno.filtered.extract.headerFixed.bcf.gz", package = "seqminer")
cfh <- readSingleChromosomeBCFToMatrixByRange(fileName, "1:196621007-196716634")
fileName = system.file("vcf/all.anno.filtered.extract.headerFixed.bcf.gz", package = "seqminer")
cfh <- readSingleChromosomeBCFToMatrixByRange(fileName, "1:196621007-196716634")

Read a range from VCF file and return a genotype matrix

Description

Read a range from VCF file and return a genotype matrix

Usage

readSingleChromosomeVCFToMatrixByRange(fileName, range, indexFileName = NULL)
readSingleChromosomeVCFToMatrixByRange(fileName, range, indexFileName = NULL)

Arguments

`fileName`	character, represents an input VCF file (Bgzipped, with Tabix index)
`range`	character, a text indicating which range in the VCF file to extract. e.g. 1:100-200, 1-based index
`indexFileName`	character, index file, by default, it s 'fileName'.scIdx

Value

genotype matrix

Examples

fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
cfh <- readSingleChromosomeVCFToMatrixByRange(fileName, "1:196621007-196716634")
fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
cfh <- readSingleChromosomeVCFToMatrixByRange(fileName, "1:196621007-196716634")

Read information from VCF file in a given range and return a list

Description

Read information from VCF file in a given range and return a list

Usage

readVCFToListByGene(
  fileName,
  geneFile,
  geneName,
  annoType,
  vcfColumn,
  vcfInfo,
  vcfIndv
)
readVCFToListByGene(
  fileName,
  geneFile,
  geneName,
  annoType,
  vcfColumn,
  vcfInfo,
  vcfIndv
)

Arguments

`fileName`	character, represents an input VCF file (Bgzipped, with Tabix index)
`geneFile`	character, a text file listing all genes in refFlat format
`geneName`	character vector, which gene(s) to be extracted
`annoType`	character, annotated types you would like to extract, such as "Nonsynonymous", "Synonymous". This can be left empty.
`vcfColumn`	character vector, which vcf columns to extract. It can be chosen from CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT and etc.
`vcfInfo`	character vector, which should be tags in the INFO columns to extarct. Common choices include: DP, AC, AF, NS
`vcfIndv`	character vector, which values to extract at individual level. Common choices are: GT, GQ, GD

Value

a list of genes, and each elements has specified vcfColumn, vcfinfo, vcfIndv

Examples

fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- readVCFToListByGene(fileName, geneFile, "CFH", "Synonymous",
                           c("CHROM", "POS"), c("AF", "AC"), c("GT") )
fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- readVCFToListByGene(fileName, geneFile, "CFH", "Synonymous",
                           c("CHROM", "POS"), c("AF", "AC"), c("GT") )

Read information from VCF file in a given range and return a list

Description

Read information from VCF file in a given range and return a list

Usage

readVCFToListByRange(fileName, range, annoType, vcfColumn, vcfInfo, vcfIndv)
readVCFToListByRange(fileName, range, annoType, vcfColumn, vcfInfo, vcfIndv)

Arguments

`fileName`	character, represents an input VCF file (Bgzipped, with Tabix index)
`range`	character, a text indicating which range in the VCF file to extract. e.g. 1:100-200, 1-based index
`annoType`	character, annotated types you would like to extract, such as "Nonsynonymous", "Synonymous". This can be left empty.
`vcfColumn`	character vector, which vcf columns to extract. It can be chosen from CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT and etc.
`vcfInfo`	character vector, which should be tags in the INFO columns to extarct. Common choices include: DP, AC, AF, NS
`vcfIndv`	character vector, which values to extract at individual level. Common choices are: GT, GQ, GD

Value

a list of genes, and each elements has specified vcfColumn, vcfinfo, vcfIndv

Examples

fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
cfh <- readVCFToListByRange(fileName, "1:196621007-196716634", "Nonsynonymous",
                            c("CHROM", "POS"), c("AF", "AC"), c("GT") )
fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
cfh <- readVCFToListByRange(fileName, "1:196621007-196716634", "Nonsynonymous",
                            c("CHROM", "POS"), c("AF", "AC"), c("GT") )

Read a gene from VCF file and return a genotype matrix

Description

Read a gene from VCF file and return a genotype matrix

Usage

readVCFToMatrixByGene(fileName, geneFile, geneName, annoType)
readVCFToMatrixByGene(fileName, geneFile, geneName, annoType)

Arguments

`fileName`	character, represents an input VCF file (Bgzipped, with Tabix index)
`geneFile`	character, a text file listing all genes in refFlat format
`geneName`	character vector, which gene(s) to be extracted
`annoType`	character, annotated types you would like to extract, such as "Nonsynonymous", "Synonymous". This can be left empty.

Value

genotype matrix

Examples

fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- readVCFToMatrixByGene(fileName, geneFile, "CFH", "Synonymous")
fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- readVCFToMatrixByGene(fileName, geneFile, "CFH", "Synonymous")

Read a gene from VCF file and return a genotype matrix

Description

Read a gene from VCF file and return a genotype matrix

Usage

readVCFToMatrixByRange(fileName, range, annoType)
readVCFToMatrixByRange(fileName, range, annoType)

Arguments

`fileName`	character, represents an input VCF file (Bgzipped, with Tabix index)
`range`	character, a text indicating which range in the VCF file to extract. e.g. 1:100-200, 1-based index
`annoType`	character, annotated types you would like to extract, such as "Nonsynonymous", "Synonymous". This can be left empty.

Value

genotype matrix

Examples

fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
cfh <- readVCFToMatrixByRange(fileName, "1:196621007-196716634", "Nonsynonymous")
fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
cfh <- readVCFToMatrixByRange(fileName, "1:196621007-196716634", "Nonsynonymous")

Read covariance by range from METAL-format files.

Description

Read covariance by range from METAL-format files.

Usage

rvmeta.readCovByRange(covFile, tabixRange)
rvmeta.readCovByRange(covFile, tabixRange)

Arguments

`covFile`	character, a covariance file (rvtests outputs using –meta cov)
`tabixRange`	character, a text indicating which range in the VCF file to extract. e.g. 1:100-200

Value

a matrix of covariance within given range

Examples

covFileName = system.file("rvtests/rvtest.MetaCov.assoc.gz", package = "seqminer")
cfh <- rvmeta.readCovByRange(covFileName, "1:196621007-196716634")
covFileName = system.file("rvtests/rvtest.MetaCov.assoc.gz", package = "seqminer")
cfh <- rvmeta.readCovByRange(covFileName, "1:196621007-196716634")

Read association statistics by gene from METAL-format files. Both score statistics and covariance statistics will be extracted.

Description

Read association statistics by gene from METAL-format files. Both score statistics and covariance statistics will be extracted.

Usage

rvmeta.readDataByGene(
  scoreTestFiles,
  covFiles,
  geneFile,
  geneName,
  multiAllelic = FALSE
)
rvmeta.readDataByGene(
  scoreTestFiles,
  covFiles,
  geneFile,
  geneName,
  multiAllelic = FALSE
)

Arguments

`scoreTestFiles`	character vector, score test output files (rvtests outputs using –meta score)
`covFiles`	character vector, covaraite files (rvtests outputs using –meta cov)
`geneFile`	character, a text file listing all genes in refFlat format
`geneName`	character vector, which gene(s) to be extracted
`multiAllelic`	boolean, whether to read multi-allelic sites as multiple variants or not

Value

a list of statistics including chromosome, position, allele frequency, score statistics, covariance and annotation(if input files are annotated).

Examples

scoreFileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
covFileName = system.file("rvtests/rvtest.MetaCov.assoc.gz", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- rvmeta.readDataByGene(scoreFileName, covFileName, geneFile, "CFH")
scoreFileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
covFileName = system.file("rvtests/rvtest.MetaCov.assoc.gz", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- rvmeta.readDataByGene(scoreFileName, covFileName, geneFile, "CFH")

Read association statistics by range from METAL-format files. Both score statistics and covariance statistics will be extracted.

Description

Read association statistics by range from METAL-format files. Both score statistics and covariance statistics will be extracted.

Usage

rvmeta.readDataByRange(scoreTestFiles, covFiles, ranges, multiAllelic = FALSE)
rvmeta.readDataByRange(scoreTestFiles, covFiles, ranges, multiAllelic = FALSE)

Arguments

`scoreTestFiles`	character vector, score test output files (rvtests outputs using –meta score)
`covFiles`	character vector, covaraite files (rvtests outputs using –meta cov)
`ranges`	character, a text indicating which range in the VCF file to extract. e.g. 1:100-200, 1-based index
`multiAllelic`	boolean, whether to read multi-allelic sites as multiple variants or not

Value

a list of statistics including chromosome, position, allele frequency, score statistics, covariance and annotation(if input files are annotated).

Examples

scoreFileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
covFileName = system.file("rvtests/rvtest.MetaCov.assoc.gz", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- rvmeta.readDataByRange(scoreFileName, covFileName, "1:196621007-196716634")
scoreFileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
covFileName = system.file("rvtests/rvtest.MetaCov.assoc.gz", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- rvmeta.readDataByRange(scoreFileName, covFileName, "1:196621007-196716634")

Read null model statistics

Description

Read null model statistics

Usage

rvmeta.readNullModel(scoreTestFiles)
rvmeta.readNullModel(scoreTestFiles)

Arguments

scoreTestFiles

character vector, score test output files (rvtests outputs using –meta score)

Value

a list of statistics fitted under the null mode (without genetic effects)

Examples

scoreFileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
scoreFileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")

Read score test statistics by range from METAL-format files.

Description

Read score test statistics by range from METAL-format files.

Usage

rvmeta.readScoreByRange(scoreTestFiles, tabixRange)
rvmeta.readScoreByRange(scoreTestFiles, tabixRange)

Arguments

`scoreTestFiles`	character vector, score test output files (rvtests outputs using –meta score)
`tabixRange`	character, a text indicating which range in the VCF file to extract. e.g. 1:100-200

Value

score test statistics within given range

Examples

scoreFileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
cfh <- rvmeta.readScoreByRange(scoreFileName, "1:196621007-196716634")
scoreFileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
cfh <- rvmeta.readScoreByRange(scoreFileName, "1:196621007-196716634")

Read skew by range from METAL-format files.

Description

Read skew by range from METAL-format files.

Usage

rvmeta.readSkewByRange(skewFile, tabixRange)
rvmeta.readSkewByRange(skewFile, tabixRange)

Arguments

`skewFile`	character, a skew file (rvtests outputs using –meta skew)
`tabixRange`	character, a text indicating which range in the VCF file to extract. e.g. 1:100-200

Value

an 3-dimensional array of skewness within given range

Examples

skewFileName = system.file("rvtests/rvtest.MetaSkew.assoc.gz", package = "seqminer")
cfh <- rvmeta.readSkewByRange(skewFileName, "1:196621007-196716634")
skewFileName = system.file("rvtests/rvtest.MetaSkew.assoc.gz", package = "seqminer")
cfh <- rvmeta.readSkewByRange(skewFileName, "1:196621007-196716634")

Write covariance association statistics files.

Description

Write covariance association statistics files.

Usage

rvmeta.writeCovData(rvmetaData, outName)
rvmeta.writeCovData(rvmetaData, outName)

Arguments

`rvmetaData`	a list vector. It's usually read by rvmeta.readDataByRange or rvmeta.readDataByGene function
`outName`	character, a text indicating output file prefix

Value

TRUE only if succeed

Examples

scoreFileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
covFileName = system.file("rvtests/rvtest.MetaCov.assoc.gz", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- rvmeta.readDataByRange(scoreFileName, covFileName, "1:196621007-196716634")

outFile <- file.path(tempdir(), "cfh.MetaCov.assoc.gz")
rvmeta.writeCovData(cfh, outFile)
cat('Outputted MetaCov file are in the temp directory:', outFile, '\n')
scoreFileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
covFileName = system.file("rvtests/rvtest.MetaCov.assoc.gz", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- rvmeta.readDataByRange(scoreFileName, covFileName, "1:196621007-196716634")

outFile <- file.path(tempdir(), "cfh.MetaCov.assoc.gz")
rvmeta.writeCovData(cfh, outFile)
cat('Outputted MetaCov file are in the temp directory:', outFile, '\n')

Write score-based association statistics files.

Description

Write score-based association statistics files.

Usage

rvmeta.writeScoreData(rvmetaData, outName, createIndex = FALSE)
rvmeta.writeScoreData(rvmetaData, outName, createIndex = FALSE)

Arguments

`rvmetaData`	a list vector. It's usually read by rvmeta.readDataByRange or rvmeta.readDataByGene function
`outName`	character, a text indicating output file prefix
`createIndex`	boolean, (default FALSE), whether or not to create the index

Value

TRUE only if succeed

Examples

scoreFileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
covFileName = system.file("rvtests/rvtest.MetaCov.assoc.gz", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- rvmeta.readDataByRange(scoreFileName, covFileName, "1:196621007-196716634")

outFile <- file.path(tempdir(), "cfh.MetaScore.assoc")
rvmeta.writeScoreData(cfh, outFile)
cat('Outputted MetaScore file are in the temp directory:', outFile, '\n')
scoreFileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
covFileName = system.file("rvtests/rvtest.MetaCov.assoc.gz", package = "seqminer")
geneFile = system.file("vcf/refFlat_hg19_6col.txt.gz", package = "seqminer")
cfh <- rvmeta.readDataByRange(scoreFileName, covFileName, "1:196621007-196716634")

outFile <- file.path(tempdir(), "cfh.MetaScore.assoc")
rvmeta.writeScoreData(cfh, outFile)
cat('Outputted MetaScore file are in the temp directory:', outFile, '\n')

Efficiently Read Sequencing Data (VCF format, METAL format) into R

Description

SeqMiner provides functions to easily load Variant Call Format (VCF) or METAL format into R

Details

The aim of this package is to save your time parsing large text file. That means data processing time can be saved for other researches. This packages requires Bgzip compressed and Tabix indexed files as input. If input files contaians annotation by TabAnno (), it is possible to extract information at the unit of genes.

Author(s)

Maintainer: Xiaowei Zhan [email protected]

Authors:

Dajiang Liu [email protected]

Other contributors:

Attractive Chaos [email protected] (We have used the following software and made minimal necessary changes: Tabix, Heng Li <[email protected]> (MIT license). We removed standard IO related functions, e.g. printf, fprintf ; also changed its un-safe pointer arithmetics.) [copyright holder]
Broad Institute / Massachusetts Institute of Technology [copyright holder]
Genome Research Ltd (GRL) [copyright holder]
Facebook, Inc [copyright holder]
D. Richard Hipp [copyright holder]

Create tabix index file, similar to running tabix in command line.

Description

Create tabix index file, similar to running tabix in command line.

Usage

tabix.createIndex(
  bgzipFile,
  sequenceColumn = 1,
  startColumn = 4,
  endColumn = 5,
  metaChar = "#",
  skipLines = 0
)
tabix.createIndex(
  bgzipFile,
  sequenceColumn = 1,
  startColumn = 4,
  endColumn = 5,
  metaChar = "#",
  skipLines = 0
)

Arguments

`bgzipFile`	character, an tabix indexed file
`sequenceColumn`	integer, sequence name column
`startColumn`	integer, start column
`endColumn`	integer, end column
`metaChar`	character, symbol for comment/meta lines
`skipLines`	integer, first this number of lines will be skipped

Examples

fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
tabix.createIndex(fileName, 1, 2, 0, '#', 0)
fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
tabix.createIndex(fileName, 1, 2, 0, '#', 0)

Create tabix index for bgzipped MetaScore/MetaCov file

Description

Create tabix index for bgzipped MetaScore/MetaCov file

Usage

tabix.createIndex.meta(bgzipFile)
tabix.createIndex.meta(bgzipFile)

Arguments

bgzipFile

character, input vcf file

Examples

fileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
tabix.createIndex.meta(fileName)
fileName = system.file("rvtests/rvtest.MetaScore.assoc.anno.gz", package = "seqminer")
tabix.createIndex.meta(fileName)

Create tabix index for bgzipped VCF file

Description

Create tabix index for bgzipped VCF file

Usage

tabix.createIndex.vcf(bgzipVcfFile)
tabix.createIndex.vcf(bgzipVcfFile)

Arguments

bgzipVcfFile

character, input vcf file

Examples

fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
tabix.createIndex.vcf(fileName)
fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
tabix.createIndex.vcf(fileName)

Read tabix file, similar to running tabix in command line.

Description

Read tabix file, similar to running tabix in command line.

Usage

tabix.read(tabixFile, tabixRange)
tabix.read(tabixFile, tabixRange)

Arguments

`tabixFile`	character, an tabix indexed file
`tabixRange`	character, a text indicating which range in the VCF file to extract. e.g. 1:100-200

Value

character vector, each elements is an individual line

Examples

if (.Platform$endian == "little") {
  fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
  snp <- tabix.read(fileName, "1:196623337-196632470")
} else {
  message("Tabix does not work well for big endian for now")
}
if (.Platform$endian == "little") {
  fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
  snp <- tabix.read(fileName, "1:196623337-196632470")
} else {
  message("Tabix does not work well for big endian for now")
}

Read tabix file, similar to running tabix in command line.

Description

Read tabix file, similar to running tabix in command line.

Usage

tabix.read.header(tabixFile, skippedLine = FALSE)
tabix.read.header(tabixFile, skippedLine = FALSE)

Arguments

`tabixFile`	character, an tabix indexed file
`skippedLine`	logical, whether to read tabix skipped lines (when used 'tabix -S NUM')

Value

a list

Examples

fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
snp <- tabix.read.header(fileName)
fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
snp <- tabix.read.header(fileName)

Read tabix file, similar to running tabix in command line.

Description

Read tabix file, similar to running tabix in command line.

Usage

tabix.read.table(
  tabixFile,
  tabixRange,
  col.names = TRUE,
  stringsAsFactors = FALSE
)
tabix.read.table(
  tabixFile,
  tabixRange,
  col.names = TRUE,
  stringsAsFactors = FALSE
)

Arguments

`tabixFile`	character, an tabix indexed file
`tabixRange`	character, a text indicating which range in the VCF file to extract. e.g. 1:100-200
`col.names`	logical, use tabix file header as result headers (default: TRUE)
`stringsAsFactors`	logical, store loaded data as factors (default: FALSE)

Value

data frame, each elements is an individual line

Examples

fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
snp <- tabix.read.table(fileName, "1:196623337-196632470")
fileName = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
snp <- tabix.read.table(fileName, "1:196623337-196632470")

Validate annotate parameter is valid

Description

Validate annotate parameter is valid

Usage

validateAnnotationParameter(param, debug = FALSE)
validateAnnotationParameter(param, debug = FALSE)

Arguments

`param`	a list of annotation elements
`debug`	show extra debug information or not

Value

list, first element is TRUE/FALSE if parameter is valid/invalid;

validate the inVcf can be created, and outVcf can be write to. will stop if any error occurs

Description

validate the inVcf can be created, and outVcf can be write to. will stop if any error occurs

Usage

verifyFilename(inVcf, outVcf)
verifyFilename(inVcf, outVcf)

Arguments

`inVcf`	input file
`outVcf`	output file

Export workflow to Makefile

Description

Export workflow to Makefile

Usage

writeWorkflow(wf, outFile)
writeWorkflow(wf, outFile)

Arguments

`wf`	a variable workflow class
`outFile`	character, typically named "Makefile"

Examples

j1 <- newJob('id1', 'cmd out1', 'out1')
j2 <- newJob('id2', 'cmd out2', 'out2', depend = 'out1')
w <- newWorkflow("wf")
w <- addJob(w, j1)
w <- addJob(w, j2)

outFile <-  file.path(tempdir(), "Makefile")
writeWorkflow(w, outFile)
cat('Outputted Makefile file are in the temp directory:', outFile, '\n')
j1 <- newJob('id1', 'cmd out1', 'out1')
j2 <- newJob('id2', 'cmd out2', 'out2', depend = 'out1')
w <- newWorkflow("wf")
w <- addJob(w, j1)
w <- addJob(w, j2)

outFile <-  file.path(tempdir(), "Makefile")
writeWorkflow(w, outFile)
cat('Outputted Makefile file are in the temp directory:', outFile, '\n')

Package 'seqminer'

Help Index

Read from binary PLINK file and return a genotype matrix

Description

Usage

Arguments

Value

See Also

Examples

Add a job to a workflow

Description

Usage

Arguments

Examples

Annotate a test variant

Description

Usage

Arguments

Value

See Also

Examples

Annotate a plain text file

Description

Usage

Arguments

Value

Examples

Annotate a VCF file

Description

Usage

Arguments

Value

Examples

Create a single chromosome index

Description

Usage

Arguments

Value

See Also

Examples

Create a single chromosome index

Description

Usage

Arguments

Value

See Also

Examples

Download annotation resources to a directory

Description

Usage

Arguments

Value

Examples

Extract pair of positions by ranges

Description

Usage

Arguments

Value

Annotate a test variant

Description

Usage

Arguments

Value

Test whether directory is writable

Description

Usage

Arguments

Value

Test whether a vector of positions are inside given ranges

Description

Usage

Arguments

Value

Examples

Check if the inputs are valid tabix range such as chr1:2-300

Description

Usage

Arguments

Examples

Construct a usable set of annotation parameters