Title: | Identification of Hybrid Peptides in Immunopeptidomic Analyses |
---|---|
Description: | Tool for the analysis Mass Spectrometry (MS) data in the context of immunopeptidomic analysis for the identification of hybrid peptides and the predictions of binding affinity of all peptides using 'netMHCpan' <doi:10.1093/nar/gkaa379> while providing a summary of the netMHCpan output. 'RHybridFinder' (RHF) is destined for researchers who are looking to analyze their MS data for the purpose of identification of potential spliced peptides. This package, developed mainly in base R, is based on the workflow published by Faridi et al. in 2018 <doi:10.1126/sciimmunol.aar3947>. |
Authors: | Frederic Saab [aut, cre], Peter Kubiniok [aut] |
Maintainer: | Frederic Saab <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2024-11-01 06:38:43 UTC |
Source: | CRAN |
checknetMHCpan, utilizes the file from the second (PEAKS) run and analyzes the data with netMHCpan in order to provide the peptide binding affinity to different HLA/MHC alleles.
checknetMHCpan( netmhcpan_directory, netmhcpan_alleles, peptide_rerun, HF_step1_output, export_files = FALSE, export_dir = NULL )
checknetMHCpan( netmhcpan_directory, netmhcpan_alleles, peptide_rerun, HF_step1_output, export_files = FALSE, export_dir = NULL )
netmhcpan_directory |
the directory in which the netMHCpan file is. |
netmhcpan_alleles |
vector of comma-separated alleles for which these peptides should be analyzed (i.e HLA_alleles_Exp1<- c("HLA-A*02:01", "HLA-A*03:01", "HLA-A24:02")) |
peptide_rerun |
dataframe containing the results of the second run |
HF_step1_output |
the HybridFinder output containing the potential splicing categorizations obtained with the HybridFinder function (HybridFinder) based on the matching of fragment pairs of peptides in 1 or 2 proteins. This parameter can be provided either by loading the .csv exported file, or if the results #' object still is in the global environment (i.e results_HF_Exp1), then it can be accessed by simply writing "results_HF_Exp1[[1]]". |
export_files |
a boolean parameter for exporting the dataframes into files in the next parameter for the output directory, Default: FALSE |
export_dir |
export_dir the output directory for the results files if export_files=TRUE, Default: NULL |
The ability to check the peptide binding affinity to the different MHC/HLA molecules is essential for assessing the antigenicity of all peptides. This function thus uses netMHCpan (Reynisson et al., 2020) for the generation of binding affinty results.
netMHCpan results pertaining to the binding affinity of all peptides in the database search results (in long- and wide- format, with data tidying in the wide format in order to compute the amount of HLA molecules to which a peptide is strong/weak/non-binder binder)(dataframe)
netMHCpan results pertaining to the binding affinity of the hybrid peptides to the MHC molecules (in long- and wide- format, with data tidying in the wide format in order to compute the amount of HLA molecules to which a peptide is strong/weak/non-binder binder) (dataframe)
the database search rerun with the categorizations already determined in step1 (HybridFinder Function)
(datafrane)
## Not run: results_checknetmhcpan_Exp1<- checknetMHCpan('/usr/local/bin', alleles, peptide_rerun, Exp1_HF_results[[1]]) results_checknetmhcpan_Exp1 <- checknetMHCpan('/usr/local/bin', alleles, peptide_rerun, Exp1_HF_results_denovo_w_spliced) ## End(Not run)
## Not run: results_checknetmhcpan_Exp1<- checknetMHCpan('/usr/local/bin', alleles, peptide_rerun, Exp1_HF_results[[1]]) results_checknetmhcpan_Exp1 <- checknetMHCpan('/usr/local/bin', alleles, peptide_rerun, Exp1_HF_results_denovo_w_spliced) ## End(Not run)
database search results from the first run using PEAKS software, on the raw dataset from the HLA ligand Atlas Human Liver sample of AutDonor17
db_Human_Liver_AUTD17
db_Human_Liver_AUTD17
A data frame with 6108 rows and 18 variables:
Peptide
character Peptide
X.10lgP
double X.10lgP
Mass
double Mass
Length
integer Peptide length
ppm
double ppm
m.z
double mass-to-charge
Z
integer charge
RT
double Retention Time
Area
double Area
Fraction
integer Fraction
Id
integer Id
Scan
character Scan
from.Chimera
character from.Chimera
Source.File
character Source.File
Accession
character Accession
PTM
character Post Translational Modification
AScore
character AScore
Found.By
character Found.By
database search results from the first run using PEAKS software, on the raw dataset from the HLA ligand Atlas Human Liver sample of AutDonor17
database search results from the second run using PEAKS software, using the raw file from HLA ligand ATLAS, Human Liver AutDonor 17
db_rerun_Human_Liver_AUTD17
db_rerun_Human_Liver_AUTD17
A data frame with 6315 rows and 18 variables:
Peptide
character Peptide
X.10lgP
double X.10lgP
Mass
double Mass
Length
integer Peptide sequence
ppm
double ppm
m.z
double mass-to-charge
Z
integer charge
RT
double Retention Time
Area
double Area
Fraction
integer Fraction
Id
integer Id
Scan
character Scan
from.Chimera
character from.Chimera
Source.File
character Source.File
Accession
character Accession
PTM
character Post-Translational Modification
AScore
character AScore
Found.By
character Found.By
database search results from the second run using PEAKS software, using the raw file from HLA ligand ATLAS, Human Liver AutDonor 17
denovo sequencing results obtained using PEAKS software, and the raw file of Human liver from autDonor 17
denovo_Human_Liver_AUTD17
denovo_Human_Liver_AUTD17
A data frame with 50114 rows and 18 variables:
Fraction
integer Fraction
Source.File
character Source.File
Feature
character Feature
Peptide
character Peptide
Scan
character Scan
Tag.length
integer Tag.length
Denovo.score
integer Denovo.score
ALC....
integer Average Local Confidence
Length
integer peptide length
m.z
double mass-to-charge(m/z)
z
integer charge
RT
double Retention Time
Predict.RT
character Predict.RT
Mass
double Mass
ppm
double ppm
local.confidence....
character confidence score per residue
tag....0..
character tag
mode
character mode
denovo sequencing results obtained using PEAKS software, and the raw file of Human liver from autDonor 17
this function allows to export the results generated from checknetMHCpan()
export_checknetmhcpan_results(list_checknetMHCpan_results, export_dir)
export_checknetmhcpan_results(list_checknetMHCpan_results, export_dir)
list_checknetMHCpan_results |
the results generated from running checknetMHCpan() |
export_dir |
the export directory where the results .csv files should be exported. |
In order to be able to have the checknetMHCpan() function results exported, this function will come in handy. Please note that this function is also part of the checknetMHCpan() function (if export_files is set to TRUE and a valid export directory is indicated)
exports a folder containing three files
netMHCpan results in long format (the original output)(.csv file)
netMHCpan results tidied (in wide format) so as to summarize the information per peptide (.tsv tab-separated file)
the updated database search results which contain the categorizatiosn of the peptides found in common between the 2nd database search and the HybridFinder function (.csv file)
## Not run: export_checknetmhcpan_results(results_checknetMHCpan_Human_Liver_AUTD17, folder_Human_Liver_AUTD17) ## End(Not run)
## Not run: export_checknetmhcpan_results(results_checknetMHCpan_Human_Liver_AUTD17, folder_Human_Liver_AUTD17) ## End(Not run)
this function allows to export the results list obtained in the HybridFinder() function.
export_HybridFinder_results(results_list, export_dir)
export_HybridFinder_results(results_list, export_dir)
results_list |
the results list obtained with the HybridFinder() function. |
export_dir |
the export directory |
In order to be able to have the HybridFinder() results list exported, this function will come in handy. Please note that this function is also part of the HybridFinder() function, therefore if you set export_files=TRUE and you indicate the export directory in export_dir in the HybridFinder() function, you would have the exact same outcome.
exports a folder containing three files
the HybridFinder output - the spectra that made it to the end with their respective columns (ALC, m/z, RT, Fraction, Scan) and a categorization column which denotes their potential splice type (-cis, -trans) or whether they are linear (the entire sequence was matched in proteins in the proteome database). Potential cis- & trans-spliced peptide are peptides whose fragments were matched with fragments within one protein, or two proteins, respectively.
list of potential hybrid peptides (excluding the linear peptides) (.csv file)
the merged proteome consisting of the reference proteome along with the hybrid proteome added at the end of the file with the sequence names following the pattern "sp|denovo_HF_fake_protein" along with a digit at the end (1,2,3,4,4,etc.) (.fasta file)
## Not run: export_results(results_HybridFinder_Human_Liver_AUTD17,folder_Human_Liver_AUTD17) ## End(Not run)
## Not run: export_results(results_HybridFinder_Human_Liver_AUTD17,folder_Human_Liver_AUTD17) ## End(Not run)
this function allows to export the results generated from step2_wo_netmhcpan.
export_step2_results(step2_RHF_results_Exp1, export_dir)
export_step2_results(step2_RHF_results_Exp1, export_dir)
step2_RHF_results_Exp1 |
the results generated from running step2_wo_netMHCpan() |
export_dir |
the export directory where you would like to have the .csv file saved. |
Since netMHCpan is not compatible with Windows OS, the package offers an alternative by outputting the input for netMHCpan and as well the database results with their respective categorizations (cis, trans) established in step1.
exports a folder containing 2 files
the peptide list to be entered in a netMHCpan-ready format,(.csv)
the updated database search results which contain the categorizatiosn of the peptides found in common between the 2nd database search and the HybridFinder function (.csv file)
## Not run: export_step2_results(results_step2_Human_Liver_AUTD17, folder_Human_Liver_AUTD17) ## End(Not run)
## Not run: export_step2_results(results_step2_Human_Liver_AUTD17, folder_Human_Liver_AUTD17) ## End(Not run)
This function takes in three mandatory inputs: (1) all denovo candidates (2) database search results and (3) the corresponding proteome fasta file. The function's role is to extract high confidence de novo peptides and to search for their existence in the proteome, whether the entire peptide sequence or its pair fragments (in one or two proteins).
HybridFinder( denovo_candidates, db_search, proteome_db, customALCcutoff = NULL, with_parallel = TRUE, customCores = 6, export_files = FALSE, export_dir = NULL )
HybridFinder( denovo_candidates, db_search, proteome_db, customALCcutoff = NULL, with_parallel = TRUE, customCores = 6, export_files = FALSE, export_dir = NULL )
denovo_candidates |
dataframe containing all denovo candidate peptides |
db_search |
dataframe containing the database search peptides |
proteome_db |
path to the proteome FASTA file |
customALCcutoff |
the default is calculated based on the median ALC of the assigned spectrum groups (spectrum groups that match in the database search results and in the denovo sequencing results) where also the peptide sequence matches, Default: NULL |
with_parallel |
for faster results, this function also utilizes parallel computing (please read more on parallel computing in order to be sure that your computer does support this), Default: TRUE |
customCores |
custom amount of cores strictly higher than 5, Default: 6 |
export_files |
a boolean parameter for exporting the dataframes into files in the next parameter for the output directory, Default: FALSE, Default: FALSE |
export_dir |
the output directory for the results files if export_files=TRUE, Default: NULL, Default: NULL |
This function is based on the published algorithm by Faridi et al. (2018) for the identification and categorization of hybrid peptides. The function described here adopts a slightly modified version of the algorithm for computational efficiency. The function starts by extracting unassigned denovo spectra where the Average Local Confidence (assigned by PEAKS software), is equivalent to the ALC cutoff which is based on the median of the assigned spectra (between denovo and database search). The sequences of all peptides are searched against the reference proteome. If there is a hit then, then, the peptide sequence within a spectrum group considered as being linear and each spectrum group is is then filtered so as to keep the highest ALC-ranking spectra. Then, the rest of the spectra (spectra that did not contain any sequence that had an entire match in the proteome database) then undergo a "cutting" procedure where each sequence yields n-2 sequences (with n being the length of the peptide. That is if the peptide contains 9 amino acids i.e NTYASPRFK, then the sequence is cut into a combination of 7 sequences of 2 fragment pairs each i.e fragment 1: NTY and fragment 2: ASPRFK, etc).These are then searched in the proteome for hits of both peptide fragments within a same protein, spectra in which sequences have fragment pairs that match within a same protein, these are considerent to be potentially cis-spliced. Potentially cis-spliced spectrum groups are then filtered based on the highest ranking ALC. Spectrum groups not considered to be potentially cis-spliced are further checked for potential trans-splicing. The peptide sequences are cut again in the same fashion, however, this time peptide fragment pairs are searched for matches in two proteins. Peptide sequences whose fragment pairs match in 2 proteins are considerend to be potentially trans-spliced. The same filtering for the highest ranking ALC within each peptide spectrum group. The remaining spectra that were neither assigned as linear nor potentially spliced (neither cis- nor trans-) are then discarded. The result is a list of spectra along with their categorizations (Linear, potentially cis- and potentially trans-) Potentially cis- and trans-spliced peptides are then concatenated and then broken into several "fake" proteins and added to the bottom of the reference proteome. The point of this last step is to create a merged proteome (consisting of the reference proteome and the hybrid proteome) which would be used for a second database search. After the second database search the checknetmhcpan function or the step2_wo_netMHCpan function can be used in order to obtain the final list of potentially spliced peptides. Article: Faridi P, Li C, Ramarathinam SH, Vivian JP, Illing PT, Mifsud NA, Ayala R, Song J, Gearing LJ, Hertzog PJ, Ternette N, Rossjohn J, Croft NP, Purcell AW. A subset of HLA-I peptides are not genomically templated: Evidence for cis- and trans-spliced peptide ligands. Sci Immunol. 2018 Oct 12;3(28):eaar3947. <doi: 10.1126/sciimmunol.aar3947>. PMID: 30315122.
The output is a list of 3 dataframes containing:
the HybridFinder output (dataframe) - the spectra that made it to the end with their respective columns (ALC, m/z, RT, Fraction, Scan) and a categorization column which denotes their potential splice type (-cis, -trans) or whether they are linear (the entire sequence was matched in proteins in the proteome database). Potential cis- & trans-spliced peptide are peptides whose fragments were matched with fragments within one protein, or two proteins, respectively.
character vector containing potentially hybrid peptides (cis- and trans-)
list containing the reference proteome and the "fake" proteins added at the end with a patterned naming convention (sp|denovo_HF_fake_protein) made up of the concatenated potential hybrid peptides.
## Not run: hybridFinderResult_list <- HybridFinder(denovo_candidates, db_search, proteome, export = TRUE, output_dir) hybridFinderResult_list <- HybridFinder(denovo_candidates, db_search, proteome) hybridFinderResult_list <- HybridFinder(denovo_candidates, db_search, proteome, export = FALSE) ## End(Not run)
## Not run: hybridFinderResult_list <- HybridFinder(denovo_candidates, db_search, proteome, export = TRUE, output_dir) hybridFinderResult_list <- HybridFinder(denovo_candidates, db_search, proteome) hybridFinderResult_list <- HybridFinder(denovo_candidates, db_search, proteome, export = FALSE) ## End(Not run)
this function only contains the alleles list, read by netMHCpan, the list was retrieved by reading the file exported from netMHCpan, using the following command line "netMHCpan -listMHC"
mhc_check(netmhcpan_alleles)
mhc_check(netmhcpan_alleles)
netmhcpan_alleles |
the netmhcpan alleles to be used for the netmhcpan call. |
a custom error is printed in case the allele is not written correctly
returns a custom error message if MHC/HLA allele(s) are not written correctly
returns nothing if there are no issues. If HLA alleles are not written correctly
if (interactive()) { mhc_check("HLA-A02:01") mhc_check("HLA-A0201") }
if (interactive()) { mhc_check("HLA-A02:01") mhc_check("HLA-A0201") }
the list of alleles in the acceptable format for netMHCpan
netmhcpan_list_alleles
netmhcpan_list_alleles
A data frame with 1024 rows and 1 variables:
V1
character the alleles
the list of alleles in the acceptable format for netMHCpan
This function helps retrieve the categorizations for the peptides from step 1 and apply them to those that are matched in the second database search.
step2_wo_netMHCpan( peptide_rerun, HF_step1_output, export_files = FALSE, export_dir = NULL )
step2_wo_netMHCpan( peptide_rerun, HF_step1_output, export_files = FALSE, export_dir = NULL )
peptide_rerun |
dataframe containing the results of the second database PEAKS search. |
HF_step1_output |
the HybridFinder output containing the potential splicing categorizations obtained with the HybridFinder function (HybridFinder) based on the matching of fragment pairs of peptides in 1 or 2 proteins. This parameter can be provided either by loading the .csv exported file, or if the results #' object still is in the global environment (i.e results_HF_Exp1), then it can be accessed by simply writing "results_HF_Exp1[[1]]". |
export_files |
a boolean parameter for exporting the dataframes into files in the next parameter for the output directory, Default: FALSE |
export_dir |
export_dir the output directory for the results files if export_files=TRUE, Default: NULL |
In special cases where the PC runs on windows OS, since it would only be possible to use the web version of netMHCpan, this function returns the peptide input file for the webversion of netMHCpan. Also, this function outputs the database search rerun results with their categorizations (into potentially cis and potentially trans) obtained from the first step (HybridFinder).
the input file for the web version of netMHCpan (dataframe)
the database search rerun with the categorizations already determined in the previous step. (character vector)
if (interactive()) { data(package="RHybridFinder", "db_rerun_Human_Liver_AUTD17") results_checknetmhcpan_Human_Liver_AUTD17<- step2_wo_netMHCpan(db_rerun_Human_Liver_AUTD17, results_HybridFinder_Human_Liver_AUTD17[[1]]) }
if (interactive()) { data(package="RHybridFinder", "db_rerun_Human_Liver_AUTD17") results_checknetmhcpan_Human_Liver_AUTD17<- step2_wo_netMHCpan(db_rerun_Human_Liver_AUTD17, results_HybridFinder_Human_Liver_AUTD17[[1]]) }