Package 'SeqFeatR' reference manual

Title:	A Tool to Associate FASTA Sequences and Features
Description:	Provides user friendly methods for the identification of sequence patterns that are statistically significantly associated with a property of the sequence. For instance, SeqFeatR allows to identify viral immune escape mutations for hosts of given HLA types. The underlying statistical method is Fisher's exact test, with appropriate corrections for multiple testing, or Bayes. Patterns may be point mutations or n-tuple of mutations. SeqFeatR offers several ways to visualize the results of the statistical analyses, see Budeus (2016) <doi:10.1371/journal.pone.0146409>.
Authors:	Bettina Budeus
Maintainer:	Bettina Budeus <[email protected]>
License:	GPL (>= 3)
Version:	0.3.1
Built:	2025-02-24 11:59:20 UTC
Source:	CRAN

Finding pairs of alignment positions that jointly mutate

Description

Determines pairs of sequence alignment positions that mutate in a correlated fashion with respect to a consensus sequence.

Usage

assocpair(path_to_file_sequence_alignment = NULL,
    path_to_file_consensus = NULL, save_name_csv, dna = FALSE,
    significance_level = 0.05, multiple_testing_correction = "bonferroni")assocpair(path_to_file_sequence_alignment = NULL,
    path_to_file_consensus = NULL, save_name_csv, dna = FALSE,
    significance_level = 0.05, multiple_testing_correction = "bonferroni")

Arguments

`path_to_file_sequence_alignment`	FASTA file with sequence alignment. See example file.
`path_to_file_consensus`	FASTA file with consensus sequence. See example file.
`save_name_csv`	name of file to which results are written in csv format.
`dna`	indicates whether sequences are DNA or amino acids.
`significance_level`	significance level for Fisher's exact test.
`multiple_testing_correction`	multiple testing correction applied to p-values. Input can be: "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none".

Details

For every position in the sequence alignment from the FASTA file a Fisher's exact test is applied with every other position in the sequence to check whether at both positions we have correlated mutations with respect to a given consensus sequence. Significant p-values are collected in one big table. p.adjust from stats package is used for multiple testing correction; corrected values are given as extra column in csv output.

In contrast to assocpairfeat, assocpair does not use features, but uses a consensus approach. Please be sure, that this is really what you want to use. Otherwise, use assocpairfeat or assoctuple instead.

Value

A csv file with every possible co-mutation below the given p-value.

Note

For graphical output use:
visualizepair.

Author(s)

Bettina Budeus

Examples

#Input files
## Not run: 
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
consensus_input <- system.file("extdata", "Example_Consensus_aa.fasta", package="SeqFeatR")
	
#Usage
assocpair(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_consensus=consensus_input,
	save_name_csv="assocpair_results.csv",
	dna=FALSE,
	significance_level=0.05,
	multiple_testing_correction="bonferroni")

## End(Not run)
#Input files
## Not run: 
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
consensus_input <- system.file("extdata", "Example_Consensus_aa.fasta", package="SeqFeatR")
	
#Usage
assocpair(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_consensus=consensus_input,
	save_name_csv="assocpair_results.csv",
	dna=FALSE,
	significance_level=0.05,
	multiple_testing_correction="bonferroni")

## End(Not run)

Finding pairs of sequence alignment positions associated with sequence feature.

Description

Determines pairs of alignment positions that jointly mutate depending on sequence feature.

Usage

assocpairfeat(path_to_file_sequence_alignment = NULL, save_name_csv, dna = FALSE, 
    patnum_threshold = 1, significance_level = 0.05, 
    A11a, A12a, A21a, A22a, B11a, B12a, B21a, B22a,
    multiple_testing_correction = "bonferroni")assocpairfeat(path_to_file_sequence_alignment = NULL, save_name_csv, dna = FALSE, 
    patnum_threshold = 1, significance_level = 0.05, 
    A11a, A12a, A21a, A22a, B11a, B12a, B21a, B22a,
    multiple_testing_correction = "bonferroni")

Arguments

`path_to_file_sequence_alignment`	FASTA file with sequence alignment. See example file.
`save_name_csv`	name of file to which results are written in csv format.
`dna`	DNA or amino acid sequences.
`patnum_threshold`	minimum number of patients per HLA type to consider in calculation.
`significance_level`	significance level in Fisher's exact test.
`A11a`	position of start of first HLA A allele in header line of FASTA file.
`A12a`	position of end of first HLA A allele in header line of FASTA file.
`A21a`	position of start of second HLA A allele in header line of FASTA file.
`A22a`	position of end of second HLA A allele in header line of FASTA file.
`B11a`	position of start of first HLA B allele in header line of FASTA file.
`B12a`	position of end of first HLA B allele in header line of FASTA file.
`B21a`	position of start of second HLA B allele in header line of FASTA file.
`B22a`	position of end of second HLA B allele in header line of FASTA file.
`multiple_testing_correction`	multiple testing correction applied to p-values. Input can be: "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none".

Details

For every position in the sequence alignment from the FASTA file a Fisher's exact test is applied with every other position in the sequence and every HLA-type. Significant p-values are collected in one big table. p.adjust from stats package is used for multiple testing correction; corrected values are given as extra column in csv output.

In contrast to assocpair, assocpairfeat uses an analysis with 'features' without a consensus sequence.

Value

Table with all alignment position pairs with significant association with sequence feature.

Note

Use visualizepairfeat for graphical output.

Author(s)

Bettina Budeus

Examples

#Input file
## Not run: 
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")

#Usage
assocpairfeat(
	path_to_file_sequence_alignment=fasta_input,
	save_name_csv="assocpairfeat_results.csv",
	dna=FALSE,
	patnum_threshold=1,
	significance_level=0.05,
	A11=10,
	A12=11,
	A21=13,
	A22=14,
	B11=17,
	B12=18,
	B21=20,
	B22=21,
	multiple_testing_correction="bonferroni")

## End(Not run)
#Input file
## Not run: 
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")

#Usage
assocpairfeat(
	path_to_file_sequence_alignment=fasta_input,
	save_name_csv="assocpairfeat_results.csv",
	dna=FALSE,
	patnum_threshold=1,
	significance_level=0.05,
	A11=10,
	A12=11,
	A21=13,
	A22=14,
	B11=17,
	B12=18,
	B21=20,
	B22=21,
	multiple_testing_correction="bonferroni")

## End(Not run)

Searches for associations of single alignment positions with feature(s)

Description

Searches for associations of single nucleotide or amino acid sequence alignment positions with feature(s).

Usage

assocpoint(path_to_file_sequence_alignment = NULL,
    path_to_file_known_epitopes = NULL, path_to_file_binding_motifs = NULL, save_name_pdf,
    save_name_csv, dna = FALSE, 
    patnum_threshold = 1, optical_significance_level = 0.05, 
    star_significance_level = 0.001, A11, A12, A21, A22, B11, B12, B21, 
    B22, C11, C12, C21, C22, has_C=FALSE, multiple_testing_correction = "bonferroni",
    bayes_factor = FALSE, constant_dirichlet_precision_parameter=FALSE,
    dirichlet_precision_parameter=20, phylo_bias_check = FALSE,
    path_to_file_reference_sequence=NULL, one_feature=FALSE,
    window_size=9, epi_plot=FALSE)assocpoint(path_to_file_sequence_alignment = NULL,
    path_to_file_known_epitopes = NULL, path_to_file_binding_motifs = NULL, save_name_pdf,
    save_name_csv, dna = FALSE, 
    patnum_threshold = 1, optical_significance_level = 0.05, 
    star_significance_level = 0.001, A11, A12, A21, A22, B11, B12, B21, 
    B22, C11, C12, C21, C22, has_C=FALSE, multiple_testing_correction = "bonferroni",
    bayes_factor = FALSE, constant_dirichlet_precision_parameter=FALSE,
    dirichlet_precision_parameter=20, phylo_bias_check = FALSE,
    path_to_file_reference_sequence=NULL, one_feature=FALSE,
    window_size=9, epi_plot=FALSE)

Arguments

`path_to_file_sequence_alignment`	FASTA file with sequence alignment. For reference see example file.
`path_to_file_known_epitopes`	csv file with known epitopes. See example file.
`path_to_file_binding_motifs`	csv file with HLA binding motifs if available. See example file.
`save_name_pdf`	name of file to which results are saved in pdf format.
`save_name_csv`	name of file to which results are saved in csv format.
`dna`	DNA or amino acid sequences.
`patnum_threshold`	minimum number of patients per HLA type to consider in calculation.
`optical_significance_level`	height of horizontal line in graphical output indicating significance level, e.g.\ 0.05.
`star_significance_level`	height of invisible horizontal line above which all points are marked as stars, e.g.\ 0.001 as level for high significance.
`A11`	position of start of first HLA A allele in header line of FASTA file.
`A12`	position of end of first HLA A allele in header line of FASTA file.
`A21`	position of start of second HLA A allele in header line of FASTA file.
`A22`	position of end of second HLA A allele in header line of FASTA file.
`B11`	position of start of first HLA B allele in header line of FASTA file.
`B12`	position of end of first HLA B allele in header line of FASTA file.
`B21`	position of start of second HLA B allele in header line of FASTA file.
`B22`	position of end of second HLA B allele in header line of FASTA file.
`C11`	position of start of first HLA C allele in header line of FASTA file.
`C12`	position of end of first HLA C allele in header line of FASTA file.
`C21`	position of start of second HLA C allele in header line of FASTA file.
`C22`	position of end of second HLA C allele in header line of FASTA file.
`has_C`	set to TRUE if there is a C allel information in header line of FASTA file.
`multiple_testing_correction`	multiple testing correction applied to p-values. Input can be: "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none".
`bayes_factor`	if bayes factor should be applied instead of Fisher's exact test for association. See details.
`constant_dirichlet_precision_parameter`	if Dirichlet precision parameter K is a fixed value. See details.
`dirichlet_precision_parameter`	the Dirichlet precision parameter for evaluation with Bayes factor. See details.
`phylo_bias_check`	simple mechanism to detect phylogenetic bias in data. See details.
`path_to_file_reference_sequence`	Reference sequence for orientation. Will be added as extra column in the results. Not used in calculation itself.
`one_feature`	if there is only one feature. See details.
`window_size`	size of sliding window in graphical output.
`epi_plot`	add epitope plot to results pdf file.

Details

For each sequence alignment position (each column of sequence alignment), Fisher's exact test is evaluated for a 2-by-2 contingency table of amino acid (or nucleic acid) vs. feature. The resulting p-values are returned in a table. p-values can be corrected for multiple testing using various methods (option multiple_testing_correction).

If bayes_factor is TRUE, a simple calculation of a bayes_factor is applied instead of the Fisher's exact test. An independent model is compared with a nearly independent model (see. Mixtures of Dirichlet Distributions and Estimation in Contingency Tables). The user can choose if a fixed dirichlet precision parameter or a calculated one should be used. The calculated dirichlet precision parameter is dependend from the number of entries in the 2-by-2 contingency table. For more information see the mentioned paper and "Bayesian Computation with R". With Bayes Factor a usage of multiple testing is not possible (and in most of all cases not needed). If constant_dirichlet_precision_parameter is FALSE, the Dirichlet precision parameter K for the calculation of the Bayes Factors is calculated by the system.

The input sequence alignment may be consist either of DNA sequences (switch dna = TRUE) or amino acid sequences (dna = FALSE). Undetermined nucleotides or amino acids have to be indicated by the letter "X".

A phylogenetic bias check can be made if bayes_factor = FALSE and the result added as an extra column to the csv result file. For each alignment position with a p-value below a user given threshold (optical_significance_level), the mean distance of all sequences with the mutation, which gave rise to this p-value, is calculated and compared with the mean distance of all sequences.

Features may be either a single feature, or HLA types, the latter indicated by four blocks in the FASTA comment lines. The positions of these blocks in the comment lines are defined by parameters A11, ..., B22 (..C22 if there is information about the C allel). For patients with a homozygous HLA allele the second allele has to be "00" (without the double quotes). For single features (no HLA types), set option one_feature=TRUE. The value of the feature (e.g. 'yes / no', or '1 / 2 / 3') should then be given at the end of each FASTA comment, separated from the part before that by a semicolon.

Value

The function generates various types of output: a table of p-values, p-values corrected for multiple testing, z-scores, amino acid with the lowest p-value at this position, result of the phylogenetic bias test, and several plots. A Manhattan plot is generated for each feature in a pdf file with alignment positions on the x-axis and p-values on the y-axis. Two different significance levels can be indicated by a line (optical_significance_level) and the change from the normal dot symbol to a star (star_significance_level).

In case of bayes_factor = TRUE, the Bayes Factor and not a p-value and the estimate of the simulation standard error of the computed value of the Bayes factor is given.

Optionally (epi_plot = TRUE), a second plot is provided with the number of 'significant' p-values or Bayes Factors (value of significance chosen by optical_significance_level) in a sliding window of x amino acids ( x can be any number from 1 up to the length of the alignment chosen by 'window_size'. The typical length of an MHC I binding motif is nine). Each point in this plot is the number of 'significant' p-values or Bayes Factors in the next x alignment positions. Additionally, with given HLA motifs (path_to_file_binding_motifs), a second line is added. This line shows potential epitopes. A potential epitope is assigned if a 'significant' p-value or Bayes Factor (value of significance chosen by optical_significance_level) is inside one of the motifs for this HLA type.

Author(s)

Bettina Budeus

References

Albert, James H.; Gupta, Arjun K. Mixtures of Dirichlet Distributions and Estimation in Contingency Tables. Ann. Statist. 10 (1982), no. 4, 1261–1268. doi:10.1214/aos/1176345991. http://projecteuclid.org/euclid.aos/1176345991.

Albert, J. Bayesian Computation with R. Springer; 2nd ed. 2009 edition (May 15, 2009)

Examples

#Input files
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
epitopes_input <- system.file("extdata", "Example_epitopes_aa.csv", package="SeqFeatR")
motifs_input <- system.file("extdata", "Example_HLA_binding_motifs_aa.csv", package="SeqFeatR")
reference_input <- system.file("extdata", "Example_reference_aa.fasta", package="SeqFeatR")

#Usage
## Not run: 
assocpoint(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_known_epitopes=epitopes_input,
	path_to_file_binding_motifs=motifs_input,
	save_name_pdf="assocpoint_results.pdf",
	save_name_csv="assocpoint_results.csv",
	dna=FALSE,
	patnum_threshold=1,
	optical_significance_level=0.01, 
	star_significance_level=0.5,
	A11=10,
	A12=11,
	A21=13,
	A22=14,
	B11=17,
	B12=18,
	B21=20,
	B22=21,
	C11=24,
	C12=25,
	C21=27,
	C22=28,
	has_C=FALSE,
	multiple_testing_correction="bonferroni",
	bayes_factor=FALSE,
	constant_dirichlet_precision_parameter=TRUE,
	dirichlet_precision_parameter=20,
	phylo_bias_check=FALSE,
	path_to_file_reference_sequence=reference_input,
	one_feature=FALSE,
	window_size=9,
	epi_plot=TRUE)

## End(Not run)
#Input files
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
epitopes_input <- system.file("extdata", "Example_epitopes_aa.csv", package="SeqFeatR")
motifs_input <- system.file("extdata", "Example_HLA_binding_motifs_aa.csv", package="SeqFeatR")
reference_input <- system.file("extdata", "Example_reference_aa.fasta", package="SeqFeatR")

#Usage
## Not run: 
assocpoint(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_known_epitopes=epitopes_input,
	path_to_file_binding_motifs=motifs_input,
	save_name_pdf="assocpoint_results.pdf",
	save_name_csv="assocpoint_results.csv",
	dna=FALSE,
	patnum_threshold=1,
	optical_significance_level=0.01, 
	star_significance_level=0.5,
	A11=10,
	A12=11,
	A21=13,
	A22=14,
	B11=17,
	B12=18,
	B21=20,
	B22=21,
	C11=24,
	C12=25,
	C21=27,
	C22=28,
	has_C=FALSE,
	multiple_testing_correction="bonferroni",
	bayes_factor=FALSE,
	constant_dirichlet_precision_parameter=TRUE,
	dirichlet_precision_parameter=20,
	phylo_bias_check=FALSE,
	path_to_file_reference_sequence=reference_input,
	one_feature=FALSE,
	window_size=9,
	epi_plot=TRUE)

## End(Not run)

Searches for associations of single alignment positions with feature(s) with a simple hierarchical model

Description

Searches for associations of single nucleotide or amino acid sequence alignment positions with feature(s).

Usage

assocpointhierarchical(path_to_file_sequence_alignment = NULL,
    path_to_file_known_epitopes = NULL, path_to_file_binding_motifs = NULL, save_name_pdf,
    save_name_csv, dna = FALSE, 
    patnum_threshold = 1, optical_significance_level = 0.05, 
    star_significance_level = 0.001, A11, A12, A21, A22, B11, B12, B21, 
    B22, path_to_file_reference_sequence=NULL, one_feature=FALSE,
    window_size=9, epi_plot=FALSE, own_model, number_of_simulations,
    number_of_burnin, number_of_chains, response_inits,	further_response_data,
    response_parameters)assocpointhierarchical(path_to_file_sequence_alignment = NULL,
    path_to_file_known_epitopes = NULL, path_to_file_binding_motifs = NULL, save_name_pdf,
    save_name_csv, dna = FALSE, 
    patnum_threshold = 1, optical_significance_level = 0.05, 
    star_significance_level = 0.001, A11, A12, A21, A22, B11, B12, B21, 
    B22, path_to_file_reference_sequence=NULL, one_feature=FALSE,
    window_size=9, epi_plot=FALSE, own_model, number_of_simulations,
    number_of_burnin, number_of_chains, response_inits,	further_response_data,
    response_parameters)

Arguments

`path_to_file_sequence_alignment`	FASTA file with sequence alignment. For reference see example file.
`path_to_file_known_epitopes`	csv file with known epitopes. See example file.
`path_to_file_binding_motifs`	csv file with HLA binding motifs if available. See example file.
`save_name_pdf`	name of file to which results are saved in pdf format.
`save_name_csv`	name of file to which results are saved in csv format.
`dna`	DNA or amino acid sequences.
`patnum_threshold`	minimum number of patients per HLA type to consider in calculation.
`optical_significance_level`	height of horizontal line in graphical output indicating significance level, e.g.\ 0.05.
`star_significance_level`	height of invisible horizontal line above which all points are marked as stars, e.g.\ 0.001 as level for high significance.
`A11`	position of start of first HLA A allele in header line of FASTA file.
`A12`	position of end of first HLA A allele in header line of FASTA file.
`A21`	position of start of second HLA A allele in header line of FASTA file.
`A22`	position of end of second HLA A allele in header line of FASTA file.
`B11`	position of start of first HLA B allele in header line of FASTA file.
`B12`	position of end of first HLA B allele in header line of FASTA file.
`B21`	position of start of second HLA B allele in header line of FASTA file.
`B22`	position of end of second HLA B allele in header line of FASTA file.
`path_to_file_reference_sequence`	Reference sequence for orientation. Will be added as extra column in the results. Not used in calculation itself.
`one_feature`	if there is only one feature. See details.
`window_size`	size of sliding window in graphical output.
`epi_plot`	add epitope plot to results pdf file.
`own_model`	txt file with jags model - optional. See example file.
`number_of_simulations`	number of total iterations per chain (including burn in) - see R2jags package.
`number_of_burnin`	length of burn in, i.e. number of iterations to discard at the beginning - see R2jags package.
`number_of_chains`	number of Markov chains - see R2jags package.
`response_inits`	specification of initial value - see details and R2jags package.
`further_response_data`	additional response data - see details and rjags/R2jags package.
`response_parameters`	name of response parameters - see details and rjags/R2jags package.

Details

For a given alignment a simple hierarchical model (amino acids/ nucleotides at each position are multi-nominally distributed, features are gamma(0.1,0.1) distributed - see example file).

Features may be either a single feature, or HLA types, the latter indicated by four blocks in the FASTA comment lines. The positions of these blocks in the comment lines are defined by parameters A11, ..., B22. For patients with a homozygous HLA allele the second allele has to be "00" (without the double quotes). For single features (no HLA types), set option one_feature=TRUE. The value of the feature (e.g. 'yes / no', or '1 / 2 / 3') should then be given at the end of each FASTA comment, separated from the part before that by a semicolon.

However a user can add his own model and parameter (only for EXPERTS!): The jags model file has to be given as 'own_model'. This 'activates' the following parameters: 1. Further, the user can add other response data ('further_response_data') besides the internally already given matrix with all amino acids/nucleotides at every position and every feature and the colSums of this matrix. 2. The user has to insert initial values ('response_inits') and 3. response parameters ('response_parameters'). For the definition of initial values the following variables can be used: - all_of_letters = all amino acids/nucleotides plus gap symbol and unknown (X) - min_FASTA_length = length of alignment - allels = all HLA-allels/features A response parameter named theta will be evaluated by the part of the program after the model, therefore the user should keep in mind to name this in his own model: The result theta has to be of the following form: 3 dimensional, with first dimension: feature vs no feature, second dimension: amino acid/nucleotide, third dimension: position (see examples file).

Value

The function generates various types of output: a table of probabilities, z-scores, amino acid/nucleotide with the highest probability at this position and several plots. A Manhattan plot is generated for each feature in a pdf file with alignment positions on the x-axis and probabilities on the y-axis. Two different significance levels can be indicated by a line (optical_significance_level) and the change from the normal dot symbol to a star (star_significance_level).

Optionally (epi_plot = TRUE), a second plot is provided with the number of 'significant' p-values (value of significance chosen by optical_significance_level) in a sliding window of x amino acids ( x can be any number from 1 up to the length of the alignment chosen by 'window_size'. The typical length of an MHC I binding motif is nine). Each point in this plot is the number of 'significant' probabilities in the next x alignment positions. Additionally, with given HLA motifs (path_to_file_binding_motifs), a second line is added. This line shows potential epitopes. A potential epitope is assigned if a 'significant' probabilities (value of significance chosen by optical_significance_level) is inside one of the motifs for this HLA type.

Note

To uses this function, it is essential to install Jags. This seems to be easy for Windows and Linux systems, but complicated for MacOS. Therefore this function is not tested in installation process.

Author(s)

Bettina Budeus

References

Albert, J. Bayesian Computation with R. Springer; 2nd ed. 2009 edition (May 15, 2009)

Examples

#Input files
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
epitopes_input <- system.file("extdata", "Example_epitopes_aa.csv", package="SeqFeatR")
motifs_input <- system.file("extdata", "Example_HLA_binding_motifs_aa.csv", package="SeqFeatR")
own_model_input <- system.file("extdata", "Example_model_for_bayes.txt", package="SeqFeatR")
reference_input <- system.file("extdata", "Example_reference_aa.fasta", package="SeqFeatR")

#Usage
## Not run: 
assocpointhierarchical(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_known_epitopes=epitopes_input,
	path_to_file_binding_motifs=motifs_input,
	save_name_pdf = "assocpointhierarchical_results.pdf",
	save_name_csv = "assocpointhierarchical_results.csv",
	dna = FALSE,
	patnum_threshold = 1,
	optical_significance_level = 97, 
	star_significance_level = 99,
	A11 = 10,
	A12 = 11,
	A21 = 13,
	A22 = 14,
	B11 = 17,
	B12 = 18,
	B21 = 20,
	B22 = 21,
	path_to_file_reference_sequence = reference_input,
	one_feature=FALSE,
	window_size=9,
	epi_plot=TRUE,
	own_model=own_model_input,
	number_of_simulations=20,
	number_of_burnin=floor(20/2),
	number_of_chains=2,
	response_inits="'vcounts' = rep(1, length(all_of_letters)), 
       'theta' = array(1/length(all_of_letters),
        dim=c(2, length(all_of_letters), min_FASTA_length))",
	further_response_data=c(),
	response_parameters=c("theta", "vcounts"))

## End(Not run)
#Input files
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
epitopes_input <- system.file("extdata", "Example_epitopes_aa.csv", package="SeqFeatR")
motifs_input <- system.file("extdata", "Example_HLA_binding_motifs_aa.csv", package="SeqFeatR")
own_model_input <- system.file("extdata", "Example_model_for_bayes.txt", package="SeqFeatR")
reference_input <- system.file("extdata", "Example_reference_aa.fasta", package="SeqFeatR")

#Usage
## Not run: 
assocpointhierarchical(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_known_epitopes=epitopes_input,
	path_to_file_binding_motifs=motifs_input,
	save_name_pdf = "assocpointhierarchical_results.pdf",
	save_name_csv = "assocpointhierarchical_results.csv",
	dna = FALSE,
	patnum_threshold = 1,
	optical_significance_level = 97, 
	star_significance_level = 99,
	A11 = 10,
	A12 = 11,
	A21 = 13,
	A22 = 14,
	B11 = 17,
	B12 = 18,
	B21 = 20,
	B22 = 21,
	path_to_file_reference_sequence = reference_input,
	one_feature=FALSE,
	window_size=9,
	epi_plot=TRUE,
	own_model=own_model_input,
	number_of_simulations=20,
	number_of_burnin=floor(20/2),
	number_of_chains=2,
	response_inits="'vcounts' = rep(1, length(all_of_letters)), 
       'theta' = array(1/length(all_of_letters),
        dim=c(2, length(all_of_letters), min_FASTA_length))",
	further_response_data=c(),
	response_parameters=c("theta", "vcounts"))

## End(Not run)

Compare results of SeqFeatRs assocpoint with SeqFeatRs assocpair, assocpairfeat

Description

Calculates which significant positions in result of assocpoint are also in results of assocpair/assocpairfeat.

Usage

assocpointpair(path_to_file_sequence_alignment = NULL, 
    path_to_file_assocpoint_csv_result = NULL, 
    path_to_file_assocpairfeat_csv_result = NULL,
    significance_level = 0.05, save_name_csv, save_name_pos)assocpointpair(path_to_file_sequence_alignment = NULL, 
    path_to_file_assocpoint_csv_result = NULL, 
    path_to_file_assocpairfeat_csv_result = NULL,
    significance_level = 0.05, save_name_csv, save_name_pos)

Arguments

`path_to_file_sequence_alignment`	FASTA file with sequence alignment. For reference see example file.
`path_to_file_assocpoint_csv_result`	file with result from SeqFeatRs assocpoint. For reference see example file.
`path_to_file_assocpairfeat_csv_result`	file with results from SeqFeatRs assocpair, assocpairfeat. For reference see example file.
`significance_level`	p-value to be defined as significant.
`save_name_csv`	name of file to which results are saved in csv format.
`save_name_pos`	name of file to which results are saved in csv format (contains possible compensatory mutations).

Details

Takes the results from assocpoint and assocpair/assocpairfeat and tries to combine them into one result: we are looking for pairs of sequence alignment positions where (a) each of the positions is significantly associated with the feature (for significance level see parameter significance_level), and (b) the sequence states of both positions are associated. Pairs of positions fulfilling these two criteria may e.g. carry compensatory mutations. If at least one such mutation is found, a detailed output is additionally created with possible compensatory mutations

Value

A csv file with p-value for alignment position pairs and if existing a csv file with possible compensatory mutations.

Author(s)

Bettina Budeus

Examples

#Input files
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
assocpoint_result <- system.file("extdata", "assocpoint_results.csv", package="SeqFeatR")
assocpairfeat_result <- system.file("extdata", "assocpairfeat_results.csv", package="SeqFeatR")

#Usage
assocpointpair(
	path_to_file_sequence_alignment=fasta_input, 
	path_to_file_assocpoint_csv_result=assocpoint_result, 
	path_to_file_assocpairfeat_csv_result=assocpairfeat_result, 
	significance_level=0.05,
	save_name_csv="assocpointpair_result.csv",
	save_name_pos="possible_compensatory_mutation.csv")
#Input files
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
assocpoint_result <- system.file("extdata", "assocpoint_results.csv", package="SeqFeatR")
assocpairfeat_result <- system.file("extdata", "assocpairfeat_results.csv", package="SeqFeatR")

#Usage
assocpointpair(
	path_to_file_sequence_alignment=fasta_input, 
	path_to_file_assocpoint_csv_result=assocpoint_result, 
	path_to_file_assocpairfeat_csv_result=assocpairfeat_result, 
	significance_level=0.05,
	save_name_csv="assocpointpair_result.csv",
	save_name_pos="possible_compensatory_mutation.csv")

Searches for associations of a tuple of alignment positions with feature(s)

Description

Searches for associations of tuple of nucleotide or amino acid sequence alignment positions with feature(s).

Usage

assoctuple(path_to_file_sequence_alignment, path_to_file_assocpoint_csv_result, threshold,
	min_number_of_elements_in_tuple, max_number_of_elements_in_tuple,
	save_name_csv, column_of_feature, column_of_position, column_of_p_values, 
        column_of_aa, A11, A12, A21, A22, B11, B12, B21, B22, one_feature, feature)assoctuple(path_to_file_sequence_alignment, path_to_file_assocpoint_csv_result, threshold,
	min_number_of_elements_in_tuple, max_number_of_elements_in_tuple,
	save_name_csv, column_of_feature, column_of_position, column_of_p_values, 
        column_of_aa, A11, A12, A21, A22, B11, B12, B21, B22, one_feature, feature)

Arguments

`path_to_file_sequence_alignment`	FASTA file with sequence alignment. For reference see example file.
`path_to_file_assocpoint_csv_result`	results from SeqFeatRs assocpoint
`threshold`	p-value threshold for sequence alignment positions to be considered.
`min_number_of_elements_in_tuple`	minimal number of members in tuple.
`max_number_of_elements_in_tuple`	maximal number of members in tuple.
`save_name_csv`	name of file to which results are saved in csv format.
`column_of_feature`	column number in which feature is located for which analysis should be done.
`column_of_position`	column number in which sequence position is located.
`column_of_p_values`	column number from which p-values should be taken. See details.
`column_of_aa`	column number from which amino acids should be taken. See details.
`A11`	position of start of first HLA A allele in header line of FASTA file.
`A12`	position of end of first HLA A allele in header line of FASTA file.
`A21`	position of start of second HLA A allele in header line of FASTA file.
`A22`	position of end of second HLA A allele in header line of FASTA file.
`B11`	position of start of first HLA B allele in header line of FASTA file.
`B12`	position of end of first HLA B allele in header line of FASTA file.
`B21`	position of start of second HLA B allele in header line of FASTA file.
`B22`	position of end of second HLA B allele in header line of FASTA file.
`one_feature`	if there is only one feature.
`feature`	feature identifier which should be analyzed. See details.

Details

For each tuple of sequence alignment positions, Fisher's exact test is evaluated for a 2-by-2 contingency table of amino acid tuple (or nucleic acid tuple) vs. feature. The resulting p-values are returned in a table.

For this to work properly the result from SeqFeatRs assocpoint can be used, but also a user generated csv file in which at least one column describes the sequence position and another the p-value at this position, as well as a FASTA file from which those file originated.

assoctuple takes only those sequence positions which have a p-value lower than the given threshold ('threshold'). Be aware that for big datasets the calculation time can be high and a calculation of every position with every other position will most definitely result in low quality data.

Please be also aware that it just uses the position in your csv file. If this is NOT the correct position, because of removal of empty or near empty alignment positions, correct the csv file before starting.

Use the same FASTA file you had used for assocpoint!

The sequence positions to be included in this analysis are normally chosen from the corrected p-values (but can be anything else as long as it is between 0 and 1, even an own added column). The size of the tuples can be anything from 2 to number of rows (= number of alignment positions) in the csv input file. The input sequence alignment may be consist either of DNA sequences (switch dna = TRUE) or amino acid sequences (dna = FALSE). Undetermined nucleotides or amino acids have to be indicated by the letter "X".

Features may be HLA types, indicated by four blocks in the FASTA comment lines. The positions of these blocks in the comment lines are defined by parameters A11, ..., B22. For patients with a homozygous HLA allele the second allele has to be "00" (without the double quotes). For non-HLA-type features, set option one_feature=TRUE. The value of the feature (e.g. 'yes / no', or '1 / 2 / 3') should then be given at the end of each FASTA comment, separated from the part before that by a semicolon.

The analysis is done only for one single feature. This is chosen by either 'feature' if there is only 'one_feature', or column_of_feature if there are HLA types.

Value

A csv list of tuple positions and the p-value of there association.

Author(s)

Bettina Budeus

Examples

#Input files
## Not run: 
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
assocpoint_result <- system.file("extdata", "assocpoint_results.csv", package="SeqFeatR")

#Usage
assoctuple(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_assocpoint_csv_result=assocpoint_result,
	threshold=0.2,
	min_number_of_elements_in_tuple=2,
	max_number_of_elements_in_tuple=2,
	save_name_csv="assoctuple_result.csv",
	column_of_feature=9,
	column_of_position=1,
	column_of_p_values=9,
	column_of_aa=12,
	A11=10,
	A12=11,
	A21=13,
	A22=14,
	B11=17,
	B12=18,
	B21=20,
	B22=21,
	one_feature=FALSE,
	feature="")

## End(Not run)
#Input files
## Not run: 
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
assocpoint_result <- system.file("extdata", "assocpoint_results.csv", package="SeqFeatR")

#Usage
assoctuple(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_assocpoint_csv_result=assocpoint_result,
	threshold=0.2,
	min_number_of_elements_in_tuple=2,
	max_number_of_elements_in_tuple=2,
	save_name_csv="assoctuple_result.csv",
	column_of_feature=9,
	column_of_position=1,
	column_of_p_values=9,
	column_of_aa=12,
	A11=10,
	A12=11,
	A21=13,
	A22=14,
	B11=17,
	B12=18,
	B21=20,
	B22=21,
	one_feature=FALSE,
	feature="")

## End(Not run)

Compares ancestral sequence with sequence alignment

Description

Generates from a nexus tree file the possible ancestral sequence and compares this with the sequence alignment.

Usage

comparewithancestral(path_to_file_sequence_alignment = NULL,
    path_to_file_nexus_tree = NULL, save_name_csv, 
    patnum_threshold = 1, significance_level = 0.05,
    A11, A12, A21, A22, B11, B12, B21, B22)comparewithancestral(path_to_file_sequence_alignment = NULL,
    path_to_file_nexus_tree = NULL, save_name_csv, 
    patnum_threshold = 1, significance_level = 0.05,
    A11, A12, A21, A22, B11, B12, B21, B22)

Arguments

`path_to_file_sequence_alignment`	FASTA file with sequence alignment. For reference see example file.
`path_to_file_nexus_tree`	nexus tree file of sequences alignment. For reference see example file.
`save_name_csv`	name of file to which results are saved in csv format.
`patnum_threshold`	minimum number of patients per HLA type to consider in calculation.
`significance_level`	significance level in Fisher's exact test.
`A11`	position of start of first HLA A allele in header line of FASTA file.
`A12`	position of end of first HLA A allele in header line of FASTA file.
`A21`	position of start of second HLA A allele in header line of FASTA file.
`A22`	position of end of second HLA A allele in header line of FASTA file.
`B11`	position of start of first HLA B allele in header line of FASTA file.
`B12`	position of end of first HLA B allele in header line of FASTA file.
`B21`	position of start of second HLA B allele in header line of FASTA file.
`B22`	position of end of second HLA B allele in header line of FASTA file.

Details

Uses the R-packages ape and phangorn to create from a given nexus tree file the ancestral sequence. For every sequence position the ancestral amino acid is compared with the amino acid of every HLA type and with a Fisher's exact test the significant differences are calculated.

Value

Generates a csv file with p-values and q-values (generated by the R-package qvalue) for every HLA type and sequence position with a p-value below given threshold (significance_level).

Note

BEWARE! You have to name the FASTA sequences (= comment line in the file) so that the R-packages ape and phangorn can read and write them correctly. It will replace any space character with "_" (without the double quotes) and ignore ":" (without the double quotes)!

Author(s)

Bettina Budeus

Examples

#Input files
## Not run: 
fasta_input <- system.file("extdata", "Example_aa_for_cwa.fasta", package="SeqFeatR")
tree_input <- system.file("extdata", "Example_tree.nh", package="SeqFeatR")

#Usage
comparewithancestral(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_nexus_tree=tree_input,
	save_name_csv="hlaTree_results.csv",
	patnum_threshold=1,
	significance_level=0.05,
	A11=10,
	A12=11,
	A21=13,
	A22=14,
	B11=17,
	B12=18,
	B21=20,
	B22=21)

## End(Not run)
#Input files
## Not run: 
fasta_input <- system.file("extdata", "Example_aa_for_cwa.fasta", package="SeqFeatR")
tree_input <- system.file("extdata", "Example_tree.nh", package="SeqFeatR")

#Usage
comparewithancestral(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_nexus_tree=tree_input,
	save_name_csv="hlaTree_results.csv",
	patnum_threshold=1,
	significance_level=0.05,
	A11=10,
	A12=11,
	A21=13,
	A22=14,
	B11=17,
	B12=18,
	B21=20,
	B22=21)

## End(Not run)

Tries to identify a founder effect in results from SeqFeatRs assocpair

Description

Takes the result of co-mutation analysis like SeqFeatRs assocpair function and adds a column with a note if the row is in the same branch.

Usage

foundereffectfinder(path_to_file_sequence_alignment = NULL,
    path_to_file_assocpair_csv_result = NULL, path_to_file_nexus_tree = NULL,
    save_name_csv, threshold = 7)foundereffectfinder(path_to_file_sequence_alignment = NULL,
    path_to_file_assocpair_csv_result = NULL, path_to_file_nexus_tree = NULL,
    save_name_csv, threshold = 7)

Arguments

`path_to_file_sequence_alignment`	FASTA file with sequence alignment. For reference see example file.
`path_to_file_assocpair_csv_result`	csv file with results from SeqFeatRs assocpair function. For reference see example file.
`path_to_file_nexus_tree`	nexus file with tree data for the given sequence alignment. For reference see example file.
`save_name_csv`	name of file to which results are saved in csv format.
`threshold`	threshold for the number of allowed occurrences of tips in one branch.

Details

Uses the results from SeqFeatRs assocpair analysis and a generated tree for the same sequences in nexus format (for tree generation, e.g http://mafft.cbrc.jp/alignment/server/ can be used). For every pair of amino acids the number of sequences with both amino acids inside one branch is calculated. If this number exceeds 'threshold' times the number of found associations, this association is considered to be due to a founder effect.

Value

Input file with an added column with founder effect information.

Note

Only use files generated with SeqFeatRs assocpair or assoctuple.

Author(s)

Bettina Budeus

References

Mayr, Ernst (1954). "Change of genetic environment and evolution". In Julian Huxley. Evolution as a Process. London: George Allen & Unwin. OCLC 974739

Examples

#Input files
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
assocpair_result <- system.file("extdata", "assocpair_results.csv", package="SeqFeatR")
tree_input <- system.file("extdata", "Example_tree.nh", package="SeqFeatR")

#Usage
foundereffectfinder(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_assocpair_csv_result=assocpair_result,
	path_to_file_nexus_tree=tree_input,
	save_name_csv="foundereffectfinder_result.csv",
	threshold=7)
#Input files
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
assocpair_result <- system.file("extdata", "assocpair_results.csv", package="SeqFeatR")
tree_input <- system.file("extdata", "Example_tree.nh", package="SeqFeatR")

#Usage
foundereffectfinder(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_assocpair_csv_result=assocpair_result,
	path_to_file_nexus_tree=tree_input,
	save_name_csv="foundereffectfinder_result.csv",
	threshold=7)

Get frequencies for amino acids/nucleotides in alignment window (epitope)

Description

Gets frequencies of amino acids/nucleotides inside an epitope out of an alignment. Epitope start and end positions must be in the range of the length of the alignment, starting with the first position of the alignment as one. You can add the whole sequence consensus, but also a part of it.

Usage

getfreqs(path_to_file_sequence_alignment = NULL, save_csv, save_png, 
 epitope_position_start, epitope_position_end,
 path_to_file_consensus, patnum_threshold = 1, A11, A12, A21, A22,
 B11, B12, B21, B22, one_feature=FALSE)getfreqs(path_to_file_sequence_alignment = NULL, save_csv, save_png, 
 epitope_position_start, epitope_position_end,
 path_to_file_consensus, patnum_threshold = 1, A11, A12, A21, A22,
 B11, B12, B21, B22, one_feature=FALSE)

Arguments

`path_to_file_sequence_alignment`	a FASTA file with sequence data. For reference please look in example file.
`save_csv`	name for the csv output.
`save_png`	name for the png/svg output.
`epitope_position_start`	starting position of the epitope in the alignment.
`epitope_position_end`	ending position of the epitope in the alignment.
`path_to_file_consensus`	a FASTA file with the consensus sequence from epi_pos_start till epi_pos_end.
`patnum_threshold`	the minimum number of patients of one HLA type to consider in the calculation.
`A11`	the position of the start of the first HLA A allele in the description block of the FASTA file.
`A12`	the position of the end of the first HLA A allele in the description block of the FASTA file.
`A21`	the position of the start of the second HLA A allele in the description block of the FASTA file.
`A22`	the position of the end of the second HLA A allele in the description block of the FASTA file.
`B11`	the position of the start of the first HLA B allele in the description block of the FASTA file.
`B12`	the position of the end of the first HLA B allele in the description block of the FASTA file.
`B21`	the position of the start of the second HLA B allele in the description block of the FASTA file.
`B22`	the position of the end of the second HLA B allele in the description block of the FASTA file.
`one_feature`	if there is only one feature.

Details

This function counts the number of amino acids NOT being the same as the consensus and generates an output graphic as well as a result csv file.

The features may be HLA types, indicated by four blocks in the FASTA comment lines. The positions of these blocks in the comment lines are defined by parameters A11, ..., B22. For patients with a homozygous HLA allele the second allele has to be "00" (without the double quotes). For non-HLA-type features, set option one_feature=TRUE. The value of the feature (e.g. 'yes / no', or '1 / 2 / 3') should then be given at the end of each FASTA comment, separated from the part before that by a semicolon.

Value

A csv file with a row for each feature and two columns (absolute and relative) for the frequencies of the amino acids in the epitope.

Author(s)

Bettina Budeus

Examples

#Input files
	fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
	consensus_input <- system.file("extdata", "Example_Consensus_aa.fasta", package="SeqFeatR")
	
#Usage
getfreqs(
	path_to_file_sequence_alignment=fasta_input,
	save_csv="getfreqs_result.csv",
	save_png="getfreqs_result.png",
	epitope_position_start=1,
	epitope_position_end=40,
	path_to_file_consensus=consensus_input,
	patnum_threshold=1,
	A11=10,
	A12=11,
	A21=13,
	A22=14,
	B11=17,
	B12=18,
	B21=20,
	B22=21,
	one_feature=FALSE)
#Input files
	fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
	consensus_input <- system.file("extdata", "Example_Consensus_aa.fasta", package="SeqFeatR")
	
#Usage
getfreqs(
	path_to_file_sequence_alignment=fasta_input,
	save_csv="getfreqs_result.csv",
	save_png="getfreqs_result.png",
	epitope_position_start=1,
	epitope_position_end=40,
	path_to_file_consensus=consensus_input,
	patnum_threshold=1,
	A11=10,
	A12=11,
	A21=13,
	A22=14,
	B11=17,
	B12=18,
	B21=20,
	B22=21,
	one_feature=FALSE)

Visualize odds ratios and p-values from SeqFeatRs assocpoint

Description

Creates odds ratio/p-value plots for every feature and sequence alignment position.

Usage

orPlot(path_to_file_assocpoint_results = NULL,
	save_name_pdf, separator = ";",
	number_of_cases = 2, odds_column_position = c(6),
	p_value_column_position = c(2), name_column_position = c(2), freq = 7,
	sequence_column_position = NULL, max_y_axis = 50, interval = 10,
        has_color = 0, bias = 5, p_or_od = "P")orPlot(path_to_file_assocpoint_results = NULL,
	save_name_pdf, separator = ";",
	number_of_cases = 2, odds_column_position = c(6),
	p_value_column_position = c(2), name_column_position = c(2), freq = 7,
	sequence_column_position = NULL, max_y_axis = 50, interval = 10,
        has_color = 0, bias = 5, p_or_od = "P")

Arguments

`path_to_file_assocpoint_results`	csv file with results from SeqFeatRs assocpoint function. For reference see example file.
`save_name_pdf`	name of file to which results are saved in csv format.
`separator`	letter with which csv is separated (usually ";" or ",", or "\t").
`number_of_cases`	number of different features in input file.
`odds_column_position`	column number of first odds ratios.
`p_value_column_position`	column number of first p-value.
`name_column_position`	column number of feature name in first row.
`freq`	frequency of repeat in columns with more than two cases.
`sequence_column_position`	column number of first amino acid/nucleotide sequence.
`max_y_axis`	estimated guess of the best p-value/ highest Odds ratio. Usuall the number behind the e as positive number.
`interval`	interval on the y axis.
`has_color`	if there should be presented another kind of information as a color feature. See details.
`bias`	bias of colors. Select with integer values if you want the first color further down. See details.
`p_or_od`	if p-values or OR should be plotted as height. "P" for p-value, "OR" for Odds ratios.

Details

Creates from SeqFeatRs assocpoint results a graphical output where each position in the sequence alignment is shown as bar. The height of this bar can be the log10 of p-value at this position or the odds ratio, the direction information (up or down) based upon odds ratio (below one is down, above one is up). If the user adds a column number for an amino acid or DNA base, this sequence is added above each bar.

The bar plot can be additionally colored by another type of information (e.g. display p-values as height, odds ratio as direction and sequence entropy as color). The source of this color information has to be added in 'has_color' as a column number. This column can be added manually before creating this plot. The column should contain only numbers between zero and one. A bias can be given to change the color distribution (see R-function colorRampPalette).

Author(s)

Bettina Budeus

Examples

#Input file
assocpoint_result <- system.file("extdata", "assocpoint_results.csv", package="SeqFeatR")
	
#Usage
orPlot(
	path_to_file_assocpoint_results=assocpoint_result, 
	save_name_pdf="orPlot.pdf",
	separator=";",
	number_of_cases=8,
	odds_column_position=6,
	p_value_column_position=2,
	name_column_position=2,
	freq=7,
	sequence_column_position=NULL,
	max_y_axis=4,
	interval=0.2,
	has_color=0,
	bias=5,
	p_or_od="P")
#Input file
assocpoint_result <- system.file("extdata", "assocpoint_results.csv", package="SeqFeatR")
	
#Usage
orPlot(
	path_to_file_assocpoint_results=assocpoint_result, 
	save_name_pdf="orPlot.pdf",
	separator=";",
	number_of_cases=8,
	odds_column_position=6,
	p_value_column_position=2,
	name_column_position=2,
	freq=7,
	sequence_column_position=NULL,
	max_y_axis=4,
	interval=0.2,
	has_color=0,
	bias=5,
	p_or_od="P")

Calculate q-values from p-values

Description

Calculates and adds a column of q-values to a list of given p-values.

Usage

qvalues(path_to_file_csv = NULL, save_name_csv)qvalues(path_to_file_csv = NULL, save_name_csv)

Arguments

`path_to_file_csv`	file with a column with p-values. Has to contain a column called "p_value" (without the double quotes). For reference see example file.
`save_name_csv`	name of file to which results are saved in csv format.

Details

Takes a csv file with a column which is called 'p_values' and uses the qvalues package to calculate from this column the corresponding q-values.

Author(s)

Bettina Budeus

Examples

#Input file
file_input <- system.file("extdata", "assocpair_results.csv", package="SeqFeatR")

#Usage
qvalues(
	path_to_file_csv=file_input, 
	save_name_csv="q-values_result.csv")
#Input file
file_input <- system.file("extdata", "assocpair_results.csv", package="SeqFeatR")

#Usage
qvalues(
	path_to_file_csv=file_input, 
	save_name_csv="q-values_result.csv")

Rewrite result from SeqFeatRs assoctuple

Description

Function to reqrite output from SeqFeatRs assoctuple into an usable form for SeqFeatRs tartan.

Usage

rewritetuple(path_to_file_assoctuple_csv_result = NULL, save_name_csv, 
first_position, second_position, value_position, separator, threshold)rewritetuple(path_to_file_assoctuple_csv_result = NULL, save_name_csv, 
first_position, second_position, value_position, separator, threshold)

Arguments

`path_to_file_assoctuple_csv_result`	csv file with results from SeqFeatRs assoctuple function. For reference see example file.
`save_name_csv`	name of file to which results are saved in csv format.
`first_position`	column position of first sequence position.
`second_position`	column position of second sequence position.
`value_position`	column position of p-value.
`separator`	separator of csv input file (usually ";", ",", "\t").
`threshold`	p-value threshold, below which data is to be included in result.

Details

Extracts the two sequence positions and p-value if it is below the threshold and generates output csv file for SeqFeatRs tartan with these values.

Author(s)

Bettina Budeus

Examples

#Input file
assoctuple_result <- system.file("extdata", "assoctuple_result.csv", package="SeqFeatR")

#Usage
rewritetuple(
	path_to_file_assoctuple_csv_result=assoctuple_result, 
	save_name_csv="rewritetuple_result.csv", 
	first_position=3, 
	second_position=4, 
	value_position=10, 
	separator="\t", 
	threshold=0.1)
#Input file
assoctuple_result <- system.file("extdata", "assoctuple_result.csv", package="SeqFeatR")

#Usage
rewritetuple(
	path_to_file_assoctuple_csv_result=assoctuple_result, 
	save_name_csv="rewritetuple_result.csv", 
	first_position=3, 
	second_position=4, 
	value_position=10, 
	separator="\t", 
	threshold=0.1)

GUI for SeqFeatR

Description

A GUI for all program features of SeqFeatR

Usage

SeqFeatR_GUI()SeqFeatR_GUI()

Details

This is a GUI for the program SeqFeatR. The first tab of the GUI deals with the association of alignment positions with features (e.g. HLA types); a subset of the functionality is provided by a web-server at http://seqfeatr.zmb.uni-due. See also video tutorial at https://www.youtube.com/watch?v=-CYidGPE6dw and vignette in this package.

Author(s)

Bettina Budeus

Examples

## Not run: SeqFeatR_GUI()

## Not run: SeqFeatR_GUI()

A small version of manhattan plot

Description

Creates a small version of the manhattan plot from assocpoint with annotated alignment positions.

Usage

smallmanhattan(path_to_file_csv, save_name_png, save_name_svg, feature,
       corrected=FALSE, x_axis_breaks)smallmanhattan(path_to_file_csv, save_name_png, save_name_svg, feature,
       corrected=FALSE, x_axis_breaks)

Arguments

`path_to_file_csv`	csv file with results from assocpoint. For reference see example file.
`save_name_png`	name of file to which graphic is saved in png format.
`save_name_svg`	name of file to which graphic is saved in svg format.
`feature`	column number in input csv file with the values. See details.
`corrected`	if the corrected p-values should be used.
`x_axis_breaks`	x axis breaks to be displayed. See details.

Details

The input file should be a result file from SeqFeatRs assocpoint.

The feature, that should be displayed as an extra manhattan plot, must be given just like it is displayed in the csv file, e.g for HLA-A2, insert "A2" as value for feature. If you want to show the corrected p-values, change corrected to TRUE.

The x_axis_breaks will mark ticks on the x axis, so that a special alignment position can be displayed. Those axis breaks have to be given in the following format only:
- starting an ending with quotes
- between those quotes comma separated ascending alignemnt position numbers (pleas keep in mind, that the plot will only display a little more than the highest alignment number, whatever big number you have given here)
- no blanks
- can be left empty (""), then breaks with 50 positions will be displayed

Value

A small version of the manhattan plot as png/svg picture file. svg files can be changed more easily than png files.

Author(s)

Bettina Budeus

Examples

#Input files
## Not run: 
assocpoint_result <- system.file("extdata", "assocpoint_results.csv", package="SeqFeatR")

#Usage
smallmanhattan(
	assocpoint_result,
        save_name_png = "small_manhattan.png", 
        save_name_svg = "small_manhattan.svg", 
        feature = "A2",
        corrected = FALSE, 
        x_axis_breaks = "1,10,16,20,30,40")

## End(Not run)
#Input files
## Not run: 
assocpoint_result <- system.file("extdata", "assocpoint_results.csv", package="SeqFeatR")

#Usage
smallmanhattan(
	assocpoint_result,
        save_name_png = "small_manhattan.png", 
        save_name_svg = "small_manhattan.svg", 
        feature = "A2",
        corrected = FALSE, 
        x_axis_breaks = "1,10,16,20,30,40")

## End(Not run)

Visualize association of mutation pairs and features from two (comparable) analyses.

Description

Combines the results of two of SeqFeatRs assocpair or assoctuple analyses in one plot.

Usage

tartan(path_to_file_assocpair_csv_result=NULL,
 path_to_file_assocpair_csv_result2=NULL, save_name_pdf, space,
 colors, name_positions, names, ticks, first_position_1,
 second_position_1, value_1, first_position_2, second_position_2,
 value_2, with_distance_matrix, path_to_distance_matrix)tartan(path_to_file_assocpair_csv_result=NULL,
 path_to_file_assocpair_csv_result2=NULL, save_name_pdf, space,
 colors, name_positions, names, ticks, first_position_1,
 second_position_1, value_1, first_position_2, second_position_2,
 value_2, with_distance_matrix, path_to_distance_matrix)

Arguments

`path_to_file_assocpair_csv_result`	csv file with results from SeqFeatRs assocpair or assoctuple function. For reference see example file.
`path_to_file_assocpair_csv_result2`	csv file with results from SeqFeatRs assocpair or assoctuple function. For reference see example file.
`save_name_pdf`	name of file to which results are saved in pdf format.
`space`	space between blocks. See details.
`colors`	the colors of the tartan plot. See details.
`name_positions`	Positions on x and y axis where the labels should be inserted.
`names`	labels to be inserted on x and y axis. Beware! name_position and names must have the same length!
`ticks`	ticks which should mark interesting spots and are also the breakpoints on which the blocks are separated. See details.
`first_position_1`	column in which first alignment position is located in the first csv file.
`second_position_1`	column in which second alignment position is located in the first csv file.
`value_1`	column in which value to be shown is located in first csv file.
`first_position_2`	column in which first alignment position is located in the second csv file.
`second_position_2`	column in which second alignment position is located in the second csv file.
`value_2`	column in which value to be shown is located in csv second file.
`with_distance_matrix`	if there is a distance matrix available and should be used.
`path_to_distance_matrix`	path to the distance matrix. Can be zero or empty.

Details

For each sequence x sequence position for one feature the corresponding (p-)values are added as a colored dot.

The input files may be the results from SeqFeatRs assocpair or assoctuple analyses, but any csv file with two different columns of sequence positions and one column with values is sufficient. The values should be in a comparable range, but they can be other information as p-values (e.g. mutual or direct information or distance matrix given as an extra input file).

The displayed colors are chosen by the user. The format for 'colors' must be color names in quotes (e.g. "red", "blue", etc.) separated by "," (without the double quotes). More colors will give a better distinction.

To distinguish different parts of the sequence (e.g proteins, markers, etc.) the user may add sequence positions as ticks ('ticks') at which the graphical display is split and a little space ('space') is inserted. The user can also add names to be displayed on the x-axis/y-axis ('name_positions', 'names').

Author(s)

Bettina Budeus

Examples

#Input files
assocpair_result <- system.file(
                    "extdata", "shared_mutations_result_for_tartan.csv", package="SeqFeatR")
assocpair_result2 <- system.file(
                    "extdata", "assocpair_results.csv", package="SeqFeatR")

#Usage
tartan(
	path_to_file_assocpair_csv_result=assocpair_result,
	path_to_file_assocpair_csv_result2=assocpair_result2,
	save_name_pdf="tartan_plot.pdf",
	space=5,
	colors=c("wheat", "darkblue", "black", "green"),
	name_positions=c(1,30),
	names=c("S","F"),
	ticks=c(10,30),
	first_position_1=2,
	second_position_1=3,
	value_1=4,
	first_position_2=2,
	second_position_2=3,
	value_2=11,
	with_distance_matrix=FALSE,
	path_to_distance_matrix=NULL)
#Input files
assocpair_result <- system.file(
                    "extdata", "shared_mutations_result_for_tartan.csv", package="SeqFeatR")
assocpair_result2 <- system.file(
                    "extdata", "assocpair_results.csv", package="SeqFeatR")

#Usage
tartan(
	path_to_file_assocpair_csv_result=assocpair_result,
	path_to_file_assocpair_csv_result2=assocpair_result2,
	save_name_pdf="tartan_plot.pdf",
	space=5,
	colors=c("wheat", "darkblue", "black", "green"),
	name_positions=c(1,30),
	names=c("S","F"),
	ticks=c(10,30),
	first_position_1=2,
	second_position_1=3,
	value_1=4,
	first_position_2=2,
	second_position_2=3,
	value_2=11,
	with_distance_matrix=FALSE,
	path_to_distance_matrix=NULL)

Visualize pairs of alignment positions that jointly mutate

Description

Creates a pdf output with a visualization of the results from the analysis from SeqFeatRs assocpair function.

Usage

visualizepair(path_to_file_assocpair_csv_result = NULL, 
    save_name_pdf, significance_level = 0.01)visualizepair(path_to_file_assocpair_csv_result = NULL, 
    save_name_pdf, significance_level = 0.01)

Arguments

`path_to_file_assocpair_csv_result`	csv file with results from SeqFeatRs assocpair function. For reference see example file.
`save_name_pdf`	name of file to which results are saved in pdf format.
`significance_level`	significance value below which the results of the analysis are considered to be relevant enough to be plotted.

Details

An output is created with a sequence position versus sequence position graphic. Every dot color(one position with the other position) is based on its p-value. The color code is on the right side. In contrast to visualizepairfeat, no feature information will be used.

Note

Only use files generated from SeqFeatRs assocpair function.

Author(s)

Bettina Budeus

Examples

#Input file
assocpair_result <- system.file("extdata", "assocpair_results.csv", package="SeqFeatR")

#Usage
visualizepair(
	path_to_file_assocpair_csv_result=assocpair_result,
	save_name_pdf="vispair_plot.pdf",
	significance_level=0.05)
#Input file
assocpair_result <- system.file("extdata", "assocpair_results.csv", package="SeqFeatR")

#Usage
visualizepair(
	path_to_file_assocpair_csv_result=assocpair_result,
	save_name_pdf="vispair_plot.pdf",
	significance_level=0.05)

Visualize pairs of sequence alignment positions associated with sequence feature.

Description

Creates a pdf output with a visualization of the results from the analysis from SeqFeatRs assocpairfeat function.

Usage

visualizepairfeat(path_to_file_assocpairfeat_csv_result = NULL, save_name_pdf, 
    significance_level = 0.01)visualizepairfeat(path_to_file_assocpairfeat_csv_result = NULL, save_name_pdf, 
    significance_level = 0.01)

Arguments

`path_to_file_assocpairfeat_csv_result`	csv file with results from SeqFeatRs assocpairfeat function. For reference see example file.
`save_name_pdf`	name of file to which results are saved in pdf format.
`significance_level`	significance value below which the results of the analysis are considered to be relevant enough to be plotted.

Details

For every feature (if only one p-value is below given threshold ('significance_level')) an output page (pdf page) is created with a sequence position versus sequence position graphic. Every dot color(one position with the other position) is based on its p-value. The color code is on the right side.

Note

Only use files generated from SeqFeatRs assocpairfeat function.

Author(s)

Bettina Budeus

Examples

#Input file
assocpairfeat_result <- system.file("extdata", "assocpairfeat_results.csv", package="SeqFeatR")

#Usage
visualizepairfeat(
	path_to_file_assocpairfeat_csv_result=assocpairfeat_result,
	save_name_pdf="vispairfeat_plot.pdf", 
	significance_level=0.05)
#Input file
assocpairfeat_result <- system.file("extdata", "assocpairfeat_results.csv", package="SeqFeatR")

#Usage
visualizepairfeat(
	path_to_file_assocpairfeat_csv_result=assocpairfeat_result,
	save_name_pdf="vispairfeat_plot.pdf", 
	significance_level=0.05)

Create volcano plot

Description

Creates a pdf output file with a volcano plot out of the results from SeqFeatRs assocpoint.

Usage

volcanoplot(path_to_file_assocpoint_csv_result,
 save_name_pdf, p_values_pos, odds_pos, pos_pos,
 level_of_sig_p, level_of_sig_odds)volcanoplot(path_to_file_assocpoint_csv_result,
 save_name_pdf, p_values_pos, odds_pos, pos_pos,
 level_of_sig_p, level_of_sig_odds)

Arguments

`path_to_file_assocpoint_csv_result`	csv file with results from SeqFeatRs assocpoint. For reference see example file.
`save_name_pdf`	name of file to which results are saved in pdf format.
`p_values_pos`	column number with p-values.
`odds_pos`	column number with odds ratios.
`pos_pos`	column number with positions.
`level_of_sig_p`	level of significance p values.
`level_of_sig_odds`	level of significance odds ratios.

Details

Plots log$_10$ Odds ratios versus -log$_10$ p-values and annotates the dots with sequence positions.

Author(s)

Bettina Budeus

Examples

#Input file
assocpoint_result <- system.file("extdata", "assocpoint_results.csv", package="SeqFeatR")

#Usage
volcanoplot(
	path_to_file_assocpoint_csv_result=assocpoint_result,
	save_name_pdf="volcano_plot.pdf",
	p_values_pos = 3,
	odds_pos = 6,
	pos_pos = 1,
	level_of_sig_p = 0.05,
	level_of_sig_odds = 1)
#Input file
assocpoint_result <- system.file("extdata", "assocpoint_results.csv", package="SeqFeatR")

#Usage
volcanoplot(
	path_to_file_assocpoint_csv_result=assocpoint_result,
	save_name_pdf="volcano_plot.pdf",
	p_values_pos = 3,
	odds_pos = 6,
	pos_pos = 1,
	level_of_sig_p = 0.05,
	level_of_sig_odds = 1)

Package 'SeqFeatR'

Help Index

Finding pairs of alignment positions that jointly mutate

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Finding pairs of sequence alignment positions associated with sequence feature.

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Searches for associations of single alignment positions with feature(s)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Searches for associations of single alignment positions with feature(s) with a simple hierarchical model

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Compare results of SeqFeatRs assocpoint with SeqFeatRs assocpair, assocpairfeat

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Searches for associations of a tuple of alignment positions with feature(s)

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Compares ancestral sequence with sequence alignment

Description

Usage

Arguments

Details

Value

Note

Author(s)

Examples

Tries to identify a founder effect in results from SeqFeatRs assocpair

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also