Title: | Searching Shared HLA Amino Acid Residue Prevalence |
---|---|
Description: | Processes amino acid alignments produced by the 'IPD-IMGT/HLA (Immuno Polymorphism-ImMunoGeneTics/Human Leukocyte Antigen) Database' to identify user-defined amino acid residue motifs shared across HLA alleles, and calculates the frequencies of those motifs based on HLA allele frequency data. 'SSHAARP' (Searching Shared HLA Amino Acid Residue Prevalence) uses 'Generic Mapping Tools (GMT)' software and the 'GMT' R package to generate global frequency heat maps that illustrate the distribution of each user-defined map around the globe. 'SSHAARP' analyzes the allele frequency data described by Solberg et al. (2008) <doi:10.1016/j.humimm.2008.05.001>, a global set of 497 population samples from 185 published datasets, representing 66,800 individuals total. |
Authors: | Livia Tran [aut, cre], Steven Mack [aut], Josh Bredeweg [ctb], Dale Steinhardt [ctb] |
Maintainer: | Livia Tran <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.0 |
Built: | 2024-11-08 06:44:43 UTC |
Source: | CRAN |
A list object containing exon boundaries for all exons in protein coding genes in the IPD-IMGT/HLA Database release v 3.39.0. Exon boundaries were determined from the nucleotide alignment files, which were downloaded from the ANHIG/IMGTHLA Github respository.
AA_atlas
AA_atlas
A list containing exon boundaries in a dataframe format for each locus.
For internal use only.
https://github.com/ANHIG/IMGTHLA/tree/Latest/alignments
Extracts alignment sequence information for a given locus from the ANHIG/IMGTHLA GitHub repository to produce a dataframe of individual amino acid data for each amino acid position for all alleles, for a user-defined HLA locus or loci. The first 4 columns are locus, allele, trimmed allele, and allele_name.
BLAASD(loci)
BLAASD(loci)
loci |
A vector of un-prefixed HLA locus names |
A list object of data frames for each specified locus. Each list element is a data frame of allele names and the corresponding peptide sequence for each amino acid position. An error message is return if the loci input is not a locus for which petpide alignments are available in the ANHIG/IMGTHLA Github Repository.
#BLAASD with one locus as input BLAASD("C") #BLAASD with multiple loci as input BLAASD(c("A", "B", "C"))
#BLAASD with one locus as input BLAASD("C") #BLAASD with multiple loci as input BLAASD(c("A", "B", "C"))
Checks input motif for errors in format and amino acid positions not present in the locus alignment.
checkMotif(motif)
checkMotif(motif)
motif |
An amino acid motif in the following format: Locus*##$~##$~##$, where ## identifies a peptide position, and $ identifies an amino acid residue. Motifs can include any number of amino acids. |
A warning message if the input motif is formatted incorrectly, or contains an amino acid position not present in the alignment. Otherwise, a list object with extracted locus information, a correctly formatted motif, and locus specific amino acid dataframe are returned. Note checkMotif() does not check amino acid variants in a specified motif; that is done by findMotif().
For internal SSHAARP use only.
#Example where a motif is formatted correctly checkMotif("DRB1*26F~28E~30Y") #Example where format is incorrect checkMotif("DRB1**26F~28E~30Y") #Example where an amino acid position does not exist checkMotif("DRB1**26F~28E~300000Y")
#Example where a motif is formatted correctly checkMotif("DRB1*26F~28E~30Y") #Example where format is incorrect checkMotif("DRB1**26F~28E~30Y") #Example where an amino acid position does not exist checkMotif("DRB1**26F~28E~300000Y")
Returns a modified version of the Solberg dataset that includes a column of locus*allele names, is sorted by by population name, and is reduced to the specified locus. Cardinal coordinates are converted to their Cartesian equivalents (i.e. 50S is converted to -50).
dataSubset(motif, filename = SSHAARP::solberg_dataset)
dataSubset(motif, filename = SSHAARP::solberg_dataset)
motif |
An amino acid motif in the following format: Locus*##$~##$~##$, where ## identifies a peptide position, and $ identifies an amino acid residue. Motifs can include any number of amino acids. |
filename |
The filename of the local copy of the Solberg dataset - the defaulted filename is the solberg_dataset in the SSHAARP package. |
A data frame containing a reformatted version of the Solberg dataset, with rows ordered by population name, Cartesian coordinates in the latit and longit columns, and limited to populations with data for the specified locus. If a motif has formatting errors, a warning message is returned.
For internal SSHAARP use only.
The Solberg dataset is the tab-delimited ‘1-locus-alleles.dat’ text file in the results.zip archive at http://pypop.org/popdata/.
The Solberg dataset is also prepackaged into SSHAARP as 'solberg_dataset'.
Consumes the alignment data frame produced by BLAASD() and returns an alignment data frame of alleles that share a specific amino acid motif.
findMotif(motif)
findMotif(motif)
motif |
An amino acid motif in the following format: Locus*##$~##$~##$, where ## identifies a peptide position, and $ identifies an amino acid residue. Motifs can include any number of amino acids. |
An amino acid alignment dataframe of alleles that share the specified motif. If the motif is not found in any alleles, or the motif has formatting errors, a warning message is returned.
#example with actual motif findMotif("DRB1*26F~28E~30Y") ("DRB1*26F~28E") #example with non-existent motif findMotif("DRB1*26F~28E~30Z") #extracting names of alleles with user-defined motif findMotif("DRB1*26F~28E~30Y")[,4]
#example with actual motif findMotif("DRB1*26F~28E~30Y") ("DRB1*26F~28E") #example with non-existent motif findMotif("DRB1*26F~28E~30Z") #extracting names of alleles with user-defined motif findMotif("DRB1*26F~28E~30Y")[,4]
A list object containing protein alignments for all protein coding genes in the IPD-IMGT/HLA Database release. Alignments were downloaded from the ANHIG/IMGTHLA Github respository.
IMGTprotalignments
IMGTprotalignments
A list containing protein alignments in a dataframe format for each locus.
https://github.com/ANHIG/IMGTHLA/tree/Latest/alignments
Produces a frequency heatmap for a specified amino-acid motif, based on the allele frequency data in the Solberg dataset.
PALM( motif, filename = SSHAARP::solberg_dataset, direct = getwd(), color = TRUE, filterMigrant = TRUE )
PALM( motif, filename = SSHAARP::solberg_dataset, direct = getwd(), color = TRUE, filterMigrant = TRUE )
motif |
An amino acid motif in the following format: Locus*##$~##$~##$, where ## identifies a peptide position, and $ identifies an amino acid residue. Motifs can include any number of amino acids. |
filename |
The filename of the local copy of the Solberg dataset - the defaulted filename is the solberg_dataset in the SSHAARP package. |
direct |
The directory into which the map produced is written. The default directory is the user's working directory. |
color |
A logical parameter that identifies if the heat maps should be made in color (TRUE) or gray scale (FALSE). The default option is TRUE. |
filterMigrant |
A logical parameter that determines if admixed populations (OTH) and migrant populations (i.e. any complexities with the 'mig') should be excluded from the dataset. The default option is TRUE. |
The specified motif and the directory into which the heatmap was written are returned in an invisible character vector. If the user enters a motif that is not found in the Solberg dataset, or that does not exist, a warning message is returned. If an incorrectly formatted motif is entered, or the user does not have the GMT software installed on their operating system, a vector with a warning message is returned. The produced heatmap is written to the user's specified directory (default is user's working directory) as a .jpg file, where the filename is "'motif'.jpg".
The produced frequency heatmap is generated by using the Generic Mapping Tools (GMT) R Package, which is an interface between R and the GMT Map-Making software.
The Solberg dataset is the tab-delimited ‘1-locus-alleles.dat’ text file in the results.zip archive at http://pypop.org/popdata/.
The Solberg dataset is also prepackaged into SSHAARP as 'solberg_dataset'.
While the map legend identifies the highest frequency value, values in this range may not be represented on the map due to frequency averaging over neighboring populations.
Solberg et.al. (2008) <doi: 10.1016/j.humimm.2008.05.001>
#example to produce a color frequency heat map where migrant populations are filtered out PALM("DRB1*26F~28E~30Y",filename = solberg_dataset[85:100,], filterMigrant=TRUE) #example to produce a greyscale heat map where migrant populations are not filtered out PALM("DRB1*26F~28E~30Y", filename = solberg_dataset[85:100,], color=FALSE, filterMigrant=FALSE)
#example to produce a color frequency heat map where migrant populations are filtered out PALM("DRB1*26F~28E~30Y",filename = solberg_dataset[85:100,], filterMigrant=TRUE) #example to produce a greyscale heat map where migrant populations are not filtered out PALM("DRB1*26F~28E~30Y", filename = solberg_dataset[85:100,], color=FALSE, filterMigrant=FALSE)
A dataframe of the original Solberg dataset, which is a global dataset of 497 population samples from 185 published datasets, representing 66,800 individuals. For more information on the Solberg dataset, please see the vignette.
solberg_dataset
solberg_dataset
A dataframe with 20163 rows and 13 columns.
results.zip file from http://pypop.org/popdata/
Solberg et.al "Balancing selection and heterogeneity across the classical human leukocyte antigen loci: A meta-analytic review of 497 population studies". Human Immunology (2008) 69, 443–464