Title: | Optimize the De Novo Stacks Pipeline via R |
---|---|
Description: | Offers a handful of useful wrapper functions which streamline the reading, analyzing, and visualizing of variant call format (vcf) files in R. This package was designed to facilitate an explicit pipeline for optimizing Stacks (Rochette et al., 2019) (<doi:10.1111/mec.15253>) parameters during de novo (without a reference genome) assembly and variant calling of restriction-enzyme associated DNA sequence (RADseq) data. The pipeline implemented here is based on the 2017 paper "Lost in Parameter Space" (Paris et al., 2017) (<doi:10.1111/2041-210X.12775>) which establishes clear recommendations for optimizing the parameters 'm', 'M', and 'n', during the process of assembling loci. |
Authors: | Devon DeRaad [aut, cre] |
Maintainer: | Devon DeRaad <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-12-14 06:44:49 UTC |
Source: | CRAN |
This function requires the path to stacks vcf file(s) as input. There are slots for varying the M parameter from 1-8 (as recommended by Paris et al. 2017). After running stacks with each of the M options, plug the output vcf files into this function to calculate the effect of varying M on the number of SNPs/loci built. Plug the output of this function into vis_loci() to visualize the optimal the M parameter for your dataset at the 'R80' cutoff (Paris et al. 2017).
optimize_bigM( M1 = NULL, M2 = NULL, M3 = NULL, M4 = NULL, M5 = NULL, M6 = NULL, M7 = NULL, M8 = NULL )
optimize_bigM( M1 = NULL, M2 = NULL, M3 = NULL, M4 = NULL, M5 = NULL, M6 = NULL, M7 = NULL, M8 = NULL )
M1 |
Path to the input vcf file for a run when M=1 |
M2 |
Path to the input vcf file for a run when M=2 |
M3 |
Path to the input vcf file for a run when M=3 |
M4 |
Path to the input vcf file for a run when M=4 |
M5 |
Path to the input vcf file for a run when M=5 |
M6 |
Path to the input vcf file for a run when M=6 |
M7 |
Path to the input vcf file for a run when M=7 |
M8 |
Path to the input vcf file for a run when M=8 |
A list containing four summary dataframes, 'snp' showing the number of non-missing SNPs retained in each sample at each m value, 'loci' showing the number of non-missing loci retained in each sample at each m value, 'snp.R80' showing the total number of SNPs retained at an 80% completeness cutoff, and 'loci.R80' showing the total number of polymorphic loci retained at an 80% completeness cutoff.
optimize_bigM(M1=system.file("extdata","bigM1.vcf.gz",package="RADstackshelpR",mustWork=TRUE))
optimize_bigM(M1=system.file("extdata","bigM1.vcf.gz",package="RADstackshelpR",mustWork=TRUE))
This function requires the path to stacks vcf file(s) as input. There are slots for varying the m parameter from 3-7 (as recommended by Paris et al. 2017). After running stacks with each of the m options, plug the output vcf files into this function to calculate the effect of varying m on depth and number of SNPs/loci built. Plug the output of this function into vis_loci() to visualize the optimal the m parameter for your dataset at the 'R80' cutoff (Paris et al. 2017).
optimize_m(m3 = NULL, m4 = NULL, m5 = NULL, m6 = NULL, m7 = NULL)
optimize_m(m3 = NULL, m4 = NULL, m5 = NULL, m6 = NULL, m7 = NULL)
m3 |
Path to the input vcf file for a run when m=3 |
m4 |
Path to the input vcf file for a run when m=4 |
m5 |
Path to the input vcf file for a run when m=5 |
m6 |
Path to the input vcf file for a run when m=6 |
m7 |
Path to the input vcf file for a run when m=7 |
A list containing five summary dataframes, 'depth' showing depth per sample for each m value, 'snp' showing the number of non-missing SNPs retained in each sample at each m value, 'loci' showing the number of non-missing loci retained in each sample at each m value, 'snp.R80' showing the total number of SNPs retained at an 80% completeness cutoff, and 'loci.R80' showing the total number of polymorphic loci retained at an 80% completeness cutoff.
optimize_m(m3=system.file("extdata","m3.vcf.gz",package="RADstackshelpR",mustWork=TRUE))
optimize_m(m3=system.file("extdata","m3.vcf.gz",package="RADstackshelpR",mustWork=TRUE))
This function requires the path to stacks vcf file(s) as input. There are slots for varying the n parameter across M-1, M, and M-1 (as recommended by Paris et al. 2017). After running stacks with each of the n options, plug the output vcf files into this function to visualize the effect of varying m on number of SNPs and loci built to recognize which value optimizes the n parameter for your dataset at the 'R80' cutoff (Paris et al. 2017).
optimize_n(nequalsMminus1 = NULL, nequalsM = NULL, nequalsMplus1 = NULL)
optimize_n(nequalsMminus1 = NULL, nequalsM = NULL, nequalsMplus1 = NULL)
nequalsMminus1 |
Path to the input vcf file for a run when n=M-1 |
nequalsM |
Path to the input vcf file for a run when n=M |
nequalsMplus1 |
Path to the input vcf file for a run when n=M+1 |
A dataframe showing the number of SNPs and loci retained across filtering levels for each n value
optimize_n(nequalsM = system.file("extdata","nequalsm.vcf.gz",package="RADstackshelpR",mustWork=TRUE))
optimize_n(nequalsM = system.file("extdata","nequalsm.vcf.gz",package="RADstackshelpR",mustWork=TRUE))
This function takes the list of dataframes output by optimize_m() as input. The function then uses ggplot2 to visualize the effect of m on depth.
vis_depth(output = NULL)
vis_depth(output = NULL)
output |
A list containing 5 dataframes generated by optimize_m() |
A plot showing the depth of each sample at each given m value
vis_depth(output = readRDS(system.file("extdata","optimize.m.output.RDS",package="RADstackshelpR",mustWork=TRUE)))
vis_depth(output = readRDS(system.file("extdata","optimize.m.output.RDS",package="RADstackshelpR",mustWork=TRUE)))
This function takes the list of dataframes output by optimize_m(), optimize_M(), or optimize_n() as input. The function then uses ggplot2 to visualize the effect of the given stacks on the number of polymorphic loci retained, reporting which value is optimal.
vis_loci(output = NULL, stacks_param = NULL)
vis_loci(output = NULL, stacks_param = NULL)
output |
A list containing 5 dataframes generated by optimize_m() |
stacks_param |
A character string indicating the stacks parameter iterated over |
A plot showing the number of polymorphic loci retained at each given parameter value
vis_loci(output = readRDS(system.file("extdata","optimize.m.output.RDS",package="RADstackshelpR",mustWork=TRUE)), stacks_param = "m")
vis_loci(output = readRDS(system.file("extdata","optimize.m.output.RDS",package="RADstackshelpR",mustWork=TRUE)), stacks_param = "m")
This function takes the list of dataframes output by optimize_m(), optimize_M(), or optimize_n() as input. The function then uses ggplot2 to visualize the effect of the given stacks on the number of SNPs retained.
vis_snps(output = NULL, stacks_param = NULL)
vis_snps(output = NULL, stacks_param = NULL)
output |
A list containing 5 dataframes generated by optimize_m() |
stacks_param |
A character string indicating the stacks parameter iterated over |
A plot showing the number of SNPs retained at each given parameter value
vis_snps(output= readRDS(system.file("extdata","optimize.m.output.RDS",package="RADstackshelpR",mustWork=TRUE)), stacks_param = "m")
vis_snps(output= readRDS(system.file("extdata","optimize.m.output.RDS",package="RADstackshelpR",mustWork=TRUE)), stacks_param = "m")