Using haploR, an R package for querying HaploReg and RegulomeDB

Overview

HaploReg (Ward and Kellis 2011), RegulomeDB (Boyle et al. 2012), and are web-based tools that extract biological information such as eQTL, LD, motifs, etc. from large genomic projects such as ENCODE, the 1000 Genomes Project, Roadmap Epigenomics Project and others. This is sometimes called “post stage GWAS” analysis.

The R-package haploR was developed to query those tools (HaploReg, RegulomeDB) directly from R in order to facilitate high-throughput genomic data analysis. Below we provide several examples that show how to work with this package.

Note: you must have a stable Internet connection to use this package.

Contact: for questions of usage the haploR or any other issues.

Motivation and typical analysis workflow

This package was inspired by the fact that many web-based post stage GWAS databases do not have Application Programing Interface (API) and, therefore, do not allow users to query them remotedly from R environment. In our research we used HaploReg and RegulomeDB web databases. These very useful web databases show information about linkage disequilibrium of query variants and variants which are in LD with them, expression quantitative trait loci (eQTL), motifs changed and other useful information. However, it is not easy to include this information into streamlined workflow since those tools also not offer API.

We developed a custom analysis pipeline which prepares data, performs genome-wide association (GWA) analysis and presents results in a user-friendly format. Results include a list of genetic variants (also known as ‘SNP’ or single nucleotide polymorphism), their corresponding p-values, phenotypes (traits) tested and other meta-information such as LD, alternative allele, minor allele frequency, motifs changed, etc. Of course, we could go throught the list of SNPs having genome-wide significant p-values (1e-8) and submit each of those SNPs to web-based tools manually, one-by-one, but it is time-consuming process and will not be fully automatic (which ruins one of the pipeline’s paradigms). This is especially difficult if the web site does not offer downloading results.

Therefore, we developed haploR, a user-friendly R package that connects to the web tool from R environment with methods POST/GET and downloads results in a suitable R format. This package siginificantly saved our time in developing reporting system for our internal genomic analysis pipeline and now we would like to present haploR to the research community.

Example of typical analysis workflow is shown below.

Typical analysis workflow
Typical analysis workflow
  • Data preprocessing stage usually consists of basic cleaning operations (sex inconsistencies check, filtering by MAF, missing genotype rate), format conversion, population stratification, creating temporaty files, etc.
  • Genome-wide association study (GWAS). This Includes testing hyphotesis on correlation between phenotype and genotype. Usually in form of linear or logistic regression but can be quite sophisticated especially for rare variants.
  • Postprocessing usually includes summarizing results in tables, creating graphics (Manhattan plot, QQ-plot), sometimes filtering by significance threshold (usually 1E-8), removing temporary files, etc.
  • Post stage GWAS analysis: connect GWAS findings with existing studies and gather information about linked SNPs, chromatin state, protein binding annotation, sequence conservation across mammals, effects on regulatory motifs and on gene expression, etc. This helps researchers to more understand functions and test additional hyphotesis of found genome-wide significant SNPs. At this stage haploR is especially useful because it provides a convenient R interface to mining web databases. This, in turn, streamlines analysis workflow and therefore significantly reduces analysis time. Previously researchers had to do it manually after analysis by exploring available web resources, checking each SNP of interest and downloading results (which is especially painful if website does not have a download option). With haploR such post-GWAS information are easily retrieved, shared and appeared in the final output tables at the end of analysis. This will save researchers’ time.

Installation of haploR package

In order to install the haploR package, the user must first install R (https://www.r-project.org). After that, haploR can be installed either:

  • From CRAN (stable version):
install.packages("haploR", dependencies = TRUE)
  • Or from the package web page (developing version):
devtools::install_github("izhbannikov/haplor")

haploR depends on the following packages:

  • httr, version 1.2.1 or later.
  • XML, version version 3.98-1.6 or later.
  • tibble, version 1.3.0 or later.
  • RUnit, version 0.4.31 or later.
  • DT, version 0.4 or later.

Examples of usage

Querying HaploReg

Function

queryHaploreg(query = NULL, file = NULL, study = NULL, ldThresh = 0.8,
  ldPop = "EUR", epi = "vanilla", cons = "siphy", genetypes = "gencode",
  url = "https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php",
  timeout = 10, encoding = "UTF-8", verbose = FALSE)

queries HaploReg web-based tool and returns results.

Arguments

  • query: Query (a vector of rsIDs).
  • file: A text file (one refSNP ID per line).
  • study: A particular study. See function getHaploRegStudyList(…). Default: NULL.
  • ldThresh: LD threshold, r2 (select NA to show only query variants). Default: 0.8.
  • ldPop: 1000G Phase 1 population for LD calculation. Can be: AFR (Africa), AMR (America), ASN (Asia). Default: EUR (Europe).
  • epi: Source for epigenomes. Possible values: vanilla for ChromHMM (Core 15-state model); imputed for ChromHMM (25-state model using 12 imputed marks); methyl for H3K4me1/H3K4me3 peaks; acetyl for H3K27ac/H3K9ac peaks. Default: vanilla.
  • cons: Mammalian conservation algorithm. Possible values: gerp for GERP (http://mendel.stanford.edu/SidowLab/downloads/gerp/), siphy for SiPhy-omega, both for both. Default: siphy.
  • genetypes: Show position relative to. Possible values: gencode for Gencode genes; refseq for RefSeq genes; both for both. Default: gencode.
  • url: HaploReg url address. Default: https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php
  • timeout: A timeout parameter for curl. Default: 10
  • encoding: Set the encoding for correct retrieval web-page content. Default: UTF-8
  • verbose: Verbosing output. Default: FALSE.

Output

A data frame (table) wrapped into a tibble object contains data extracted from HaploReg site. The colums (33 in total at the time of writing this vignette) are specific to HaploReg output. Below we describe the columns:

  • chr: Chromosome, type: numeric
  • pos_hg38: Position on the human genome, type: numeric.
  • r2: Linkage disequilibrium. Type: numeric.
  • D’: Linkage disequilibrium, alternative definition. Type: numeric.
  • is_query_snp: Indicator shows query SNP, 0 - not query SNP, 1 - query SNP. Type: numeric.
  • rsID: refSNP ID. Type: character.
  • ref: Reference allele. Type: character.
  • alt: Alternative allele. Type: character.
  • AFR: r2 calculated for Africa. Type: numeric.
  • AMR: r2 calculated for America. Type: numeric.
  • ASN: r2 calculated for Asia. Type: numeric.
  • EUR: r2 calculated for Europe. Type: numeric.
  • GERP_cons: GERP scores. Type: numeric.
  • SiPhy_cons: SiPhy scores. Type: numeric.
  • Chromatin_States: Chromatin states: reference epigenome identifiers (EID) of chromatin-associated proteins and histone modifications in that region. Type: character.
  • Chromatin_States_Imputed: Chromatin states based on imputed data. Type: character.
  • Chromatin_Marks: Chromatin marks Type: character.
  • DNAse: DNAse. Type: chararcter.
  • Proteins: A list of protein names. Type: character.
  • eQTL: Expression Quantitative Trait Loci. Type: character.
  • gwas: GWAS study name. Type: character.
  • grasp: GRASP study name: character.
  • Motifs: Motif names. Type: character.
  • GENCODE_id: GENCODE transcript ID. Type: character.
  • GENCODE_name: GENCODE gene name. Type: character.
  • GENCODE_direction: GENCODE direction (transcription toward 3’ or 5’ end of the DNA sequence). Type: numeric.
  • GENCODE_distance: GENCODE distance. Type: numeric.
  • RefSeq_id: NCBI Reference Sequence Accession number. Type: character.
  • RefSeq_name: NCBI Reference Sequence name. Type: character.
  • RefSeq_direction: NCBI Reference Sequence direction (transcription toward 3’ or 5’ end of the DNA sequence). Type: numeric.
  • RefSeq_distance: NCBI Reference Sequence distance. Type: numeric.
  • dbSNP_functional_annotation Annotated proteins associated with the SNP. Type: numeric.
  • query_snp_rsid: Query SNP rs ID. Type: character.
  • Promoter_histone_marks: Promoter histone marks. Type: factor.
  • Enhancer_histone_marks: Enhancer histone marks. Type: factor.

Number of rows is not constant, at least equal or more than the number of query SNPs, and depends on r2 parameter choosen in a query (default 0.8). This means that the program outputs not only query SNPs, but also those SNPs that have r2 0.8 with the query SNPs.

One or several genetic variants

library(haploR)
x <- queryHaploreg(query=c("rs10048158","rs4791078"))
x
## # A tibble: 33 × 35
##    chr     pos_hg38    r2  `D'` is_query_snp rsID  ref   alt     AFR   AMR   ASN
##    <chr>   <chr>    <dbl> <dbl>        <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl>
##  1 17      66240200  1     1               1 rs10… T     C      0.93  0.66  0.92
##  2 17      66240200  0.82  0.93            0 rs10… T     C      0.93  0.66  0.92
##  3 Array17 66231972  0.82  0.99            0 rs11… G     T      0.88  0.63  0.9 
##  4 Array17 66248387  0.99  1               0 rs12… T     C      0.93  0.66  0.92
##  5 Array17 66248387  0.81  0.92            0 rs12… T     C      0.93  0.66  0.92
##  6 17      66214285  0.98  0.99            0 rs19… G     C      0.86  0.67  0.91
##  7 17      66214285  0.83  0.93            0 rs19… G     C      0.86  0.67  0.91
##  8 Array17 66240359  0.83  1               0 rs20… A     G      0.64  0.6   0.91
##  9 Array17 66219777  0.82  0.99            0 rs22… A     G      0.89  0.63  0.89
## 10 17      66219453  0.83  0.93            0 rs22… G     A      0.91  0.67  0.91
## # ℹ 23 more rows
## # ℹ 24 more variables: EUR <dbl>, GERP_cons <chr>, SiPhy_cons <chr>,
## #   Chromatin_States <chr>, Chromatin_States_Imputed <chr>,
## #   Chromatin_Marks <chr>, DNAse <chr>, Proteins <chr>, eQTL <chr>, gwas <chr>,
## #   grasp <chr>, Motifs <chr>, GENCODE_id <chr>, GENCODE_name <chr>,
## #   GENCODE_direction <chr>, GENCODE_distance <chr>, RefSeq_id <chr>,
## #   RefSeq_name <chr>, RefSeq_direction <chr>, RefSeq_distance <chr>, …

Here query is a vector with names of genetic variants.

We then can create a subset from the results, for example, to choose only SNPs with r2 > 0.9:

if(length(x)!=0) {
  subset.high.LD <- x[as.numeric(x$r2) > 0.9, c("rsID", "r2", "chr", "pos_hg38", "is_query_snp", "ref", "alt")]
  subset.high.LD
}
## # A tibble: 13 × 7
##    rsID          r2 chr     pos_hg38 is_query_snp ref   alt  
##    <chr>      <dbl> <chr>   <chr>           <dbl> <chr> <chr>
##  1 rs10048158  1    17      66240200            1 T     C    
##  2 rs12603947  0.99 Array17 66248387            0 T     C    
##  3 rs1971682   0.98 17      66214285            0 G     C    
##  4 rs2215415   0.99 17      66219453            0 G     A    
##  5 rs3744317   0.99 Array17 66220526            0 G     A    
##  6 rs4366742   0.99 Array17 66216124            0 T     C    
##  7 rs4790914   1    17      66213160            0 C     G    
##  8 rs4791078   1    17      66213896            1 A     C    
##  9 rs4791079   1    17      66213422            0 T     G    
## 10 rs71160546  0.94 Array17 66230111            0 GA    G    
## 11 rs7342920   0.99 Array17 66248527            0 T     G    
## 12 rs8178827   0.96 Array17 66227121            0 C     T    
## 13 rs9895261   1    17      66244318            0 A     G

We can then save the subset.high.LD into an Excel workbook:

require(openxlsx)
write.xlsx(x=subset.high.LD, file="subset.high.LD.xlsx")

This was an example of gathering post-gwas information directly from the online tool. haploR has an additional advantage because it downloads the full information for query retrieved by HaploReg. For example, if you go online and submit these two SNPs to HaploReg (https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php), you will see that some cells of columns “Motifs changed” and “Selected eQTL hits” are hidded (only number of hits are given). haploR retrives this information in a form of a data frame which can be saved into Excel file.

if(length(x)!=0) {
    x[, c("Motifs", "rsID")]
    x[, c("eQTL", "rsID")]
}

Uploading file with variants

If you have a file with your SNPs you would like to analyze, you can supply it as an input as follows:

library(haploR)
x <- queryHaploreg(file=system.file("extdata/snps.txt", package = "haploR"))
x

File “snps.txt” is a text file which contains one rs-ID per line:

rs10048158
rs4791078

Using existing studies

Sometimes one would like to explore results from already performed study. In this case you should first explore the existing studies from HaploReg web site (https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php) and then use one of them as an input parameter. See example below:

library(haploR)
# Getting a list of existing studies:
studies <- getStudyList()
# Let us look at the first element:
if(!is.null(studies)) {
    studies[[1]]
    # Let us look at the second element:
    studies[[2]]
    # Query Hploreg to explore results from 
    # this study:
    x <- queryHaploreg(study=studies[[1]])
    x
}

Extended view of SNP

If you want to see more information about SNP of interest, you can use the getExtendedView: Parameters: snp - A SNP of interest. url - A url to the HaploReg. Default: “https://pubs.broadinstitute.org/mammals/haploreg/detail_v4.1.php?query=&id=

Return A list of tables t1, t2, …, etc, depending on information contained in HaploReg database.

Example:

library(haploR)
tables <- getExtendedView(snp="rs10048158")
tables

Querying RegulomeDB

To query RegulomeDB use this function:

queryRegulome(query = NULL, 
              genomeAssembly = GRCh37,
              limit = 1000,
              timeout = 100)

This function queries RegulomeDB https://www.regulomedb.org/regulome-search/ web-based tool and returns results in a named list.

Arguments

  • query: Query (a vector of rsIDs).
  • genomeAssembly : Genome assembly built: can be GRCh37 or GRCh38
  • limit : It controls how many variants will be queried and returned for a large region. It can be a number (1000 by default) or “all”. Please note that large number or “all” may get yourself hurt because you could get timeout or may even crash the server.
  • timeout: A ‘timeout’ parameter for ‘curl’. Default: 100

Output

A data frame (table) OR a list with the following items: - guery_coordinates - features - regulome_score - variants - nearby_snps - assembly

Example

library(haploR)

# With RsIDs:
x <- queryRegulome(c("rs4791078","rs10048158"))
x

# With region:
y <- queryRegulome("chr1:39492461-39492462")
y

Roadmap cell types

The Roadmap Epigenomics project employs data organization schema based on anatomical origin of the data (see Roadmap Epigenomics web site).

EID GROUP ANATOMY GR Epigenome Mnemonic Standardized Epigenome name (from Wouter) Epigenome name (from EDACC Release 9 directory) TYPE
E033 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD3.CPC Primary T cells from cord blood CD3_Primary_Cells_Cord_BI PrimaryCell
E034 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD3.PPC Primary T cells from peripheral blood CD3_Primary_Cells_Peripheral_UW PrimaryCell
E037 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD4.MPC Primary T helper memory cells from peripheral blood 2 CD4_Memory_Primary_Cells PrimaryCell
E038 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD4.NPC Primary T helper naive cells from peripheral blood CD4_Naive_Primary_Cells PrimaryCell
E039 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD4.CD25M.CD45RA.NPC Primary T helper naive cells from peripheral blood CD4+_CD25-_CD45RA+_Naive_Primary_Cells PrimaryCell
E040 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD4.CD25M.CD45RO.MPC Primary T helper memory cells from peripheral blood 1 CD4+_CD25-_CD45RO+_Memory_Primary_Cells PrimaryCell
E041 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD4.CD25M.IL17M.PL.TPC Primary T helper cells PMA-I stimulated CD4+_CD25-_IL17-_PMA-Ionomycin_stimulated_MACS_purified_Th_Primary_Cells
PrimaryCell
E042 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD4.CD25M.IL17P.PL.TPC Primary T helper 17 cells PMA-I stimulated CD4+_CD25-_IL17+_PMA-Ionomcyin_stimulated_Th17_Primary_Cells PrimaryCell
E043 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD4.CD25M.TPC Primary T helper cells from peripheral blood CD4+_CD25-_Th_Primary_Cells PrimaryCell
E044 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD4.CD25.CD127M.TREGPC Primary T regulatory cells from peripheral blood CD4+_CD25+_CD127-_Treg_Primary_Cells PrimaryCell
E045 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD4.CD25I.CD127.TMEMPC Primary T cells effector/memory enriched from peripheral blood CD4+_CD25int_CD127+_Tmem_Primary_Cells PrimaryCell
E047 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD8.NPC Primary T CD8+ naive cells from peripheral blood CD8_Naive_Primary_Cells PrimaryCell
E048 Blood & T-cell BLOOD Blood_and_Tcell BLD.CD8.MPC Primary T CD8+ memory cells from peripheral blood CD8_Memory_Primary_Cells PrimaryCell
E062 Blood & T-cell BLOOD Blood_and_Tcell BLD.PER.MONUC.PC Primary mononuclear cells from peripheral blood Peripheral_Blood_Mononuclear_Primary_Cells PrimaryCell
E053 Neurosph BRAIN Brain BRN.CRTX.DR.NRSPHR Cortex derived primary cultured neurospheres Neurosphere_Cultured_Cells_Cortex_Derived PrimaryCulture
E054 Neurosph BRAIN Brain BRN.GANGEM.DR.NRSPHR Ganglion Eminence derived primary cultured neurospheres Neurosphere_Cultured_Cells_Ganglionic_Eminence_Derived PrimaryCulture
E067 Brain BRAIN Brain BRN.ANG.GYR Brain Angular Gyrus Brain_Angular_Gyrus PrimaryTissue
E068 Brain BRAIN Brain BRN.ANT.CAUD Brain Anterior Caudate Brain_Anterior_Caudate PrimaryTissue
E069 Brain BRAIN Brain BRN.CING.GYR Brain Cingulate Gyrus Brain_Cingulate_Gyrus PrimaryTissue
E070 Brain BRAIN Brain BRN.GRM.MTRX Brain Germinal Matrix Brain_Germinal_Matrix PrimaryTissue
E071 Brain BRAIN Brain BRN.HIPP.MID Brain Hippocampus Middle Brain_Hippocampus_Middle PrimaryTissue
E072 Brain BRAIN Brain BRN.INF.TMP Brain Inferior Temporal Lobe Brain_Inferior_Temporal_Lobe PrimaryTissue
E073 Brain BRAIN Brain BRN.DL.PRFRNTL.CRTX Brain_Dorsolateral_Prefrontal_Cortex Brain_Mid_Frontal_Lobe PrimaryTissue
E074 Brain BRAIN Brain BRN.SUB.NIG Brain Substantia Nigra Brain_Substantia_Nigra PrimaryTissue
E081 Brain BRAIN Brain BRN.FET.M Fetal Brain Male Fetal_Brain_Male PrimaryTissue
E082 Brain BRAIN Brain BRN.FET.F Fetal Brain Female Fetal_Brain_Female PrimaryTissue
E075 Digestive GI_COLON Digestive GI.CLN.MUC Colonic Mucosa Colonic_Mucosa PrimaryTissue
E077 Digestive GI_DUODENUM Digestive GI.DUO.MUC Duodenum Mucosa Duodenum_Mucosa PrimaryTissue
E079 Digestive GI_ESOPHAGUS Digestive GI.ESO Esophagus Esophagus PrimaryTissue
E084 Digestive GI_INTESTINE Digestive GI.L.INT.FET Fetal Intestine Large Fetal_Intestine_Large PrimaryTissue
E085 Digestive GI_INTESTINE Digestive GI.S.INT.FET Fetal Intestine Small Fetal_Intestine_Small PrimaryTissue
E092 Digestive GI_STOMACH Digestive GI.STMC.FET Fetal Stomach Fetal_Stomach PrimaryTissue
E094 Digestive GI_STOMACH Digestive GI.STMC.GAST Gastric Gastric PrimaryTissue
E101 Digestive GI_RECTUM Digestive GI.RECT.MUC.29 Rectal Mucosa Donor 29 Rectal_Mucosa.Donor_29 PrimaryTissue
E102 Digestive GI_RECTUM Digestive GI.RECT.MUC.31 Rectal Mucosa Donor 31 Rectal_Mucosa.Donor_31 PrimaryTissue
E106 Digestive GI_COLON Digestive GI.CLN.SIG Sigmoid Colon Sigmoid_Colon PrimaryTissue
E109 Digestive GI_INTESTINE Digestive GI.S.INT Small Intestine Small_Intestine PrimaryTissue
E110 Digestive GI_STOMACH Digestive GI.STMC.MUC Stomach Mucosa Stomach_Mucosa PrimaryTissue
E114 ENCODE2012 LUNG ENCODE2012 LNG.A549.ETOH002.CNCR A549 EtOH 0.02pct Lung Carcinoma Cell Line A549_EtOH_0.02pct_Lung_Carcinoma CellLine
E115 ENCODE2012 BLOOD ENCODE2012 BLD.DND41.CNCR Dnd41 TCell Leukemia Cell Line Dnd41_TCell_Leukemia CellLine
E116 ENCODE2012 BLOOD ENCODE2012 BLD.GM12878 GM12878 Lymphoblastoid Cells GM12878_Lymphoblastoid PrimaryCulture
E117 ENCODE2012 CERVIX ENCODE2012 CRVX.HELAS3.CNCR HeLa-S3 Cervical Carcinoma Cell Line HeLa-S3_Cervical_Carcinoma CellLine
E118 ENCODE2012 LIVER ENCODE2012 LIV.HEPG2.CNCR HepG2 Hepatocellular Carcinoma Cell Line HepG2_Hepatocellular_Carcinoma CellLine
E119 ENCODE2012 BREAST ENCODE2012 BRST.HMEC HMEC Mammary Epithelial Primary Cells HMEC_Mammary_Epithelial PrimaryCulture
E120 ENCODE2012 MUSCLE ENCODE2012 MUS.HSMM HSMM Skeletal Muscle Myoblasts Cells HSMM_Skeletal_Muscle_Myoblasts PrimaryCulture
E121 ENCODE2012 MUSCLE ENCODE2012 MUS.HSMMT HSMM cell derived Skeletal Muscle Myotubes Cells HSMMtube_Skeletal_Muscle_Myotubes_Derived_from_HSMM PrimaryCulture
E122 ENCODE2012 VASCULAR ENCODE2012 VAS.HUVEC HUVEC Umbilical Vein Endothelial Primary Cells HUVEC_Umbilical_Vein_Endothelial_Cells PrimaryCulture
E123 ENCODE2012 BLOOD ENCODE2012 BLD.K562.CNCR K562 Leukemia Cells K562_Leukemia PrimaryCulture
E124 ENCODE2012 BLOOD ENCODE2012 BLD.CD14.MONO Monocytes-CD14+ RO01746 Primary Cells Monocytes-CD14+_RO01746 PrimaryCell
E125 ENCODE2012 BRAIN ENCODE2012 BRN.NHA NH-A Astrocytes Primary Cells NH-A_Astrocytes PrimaryCulture
E126 ENCODE2012 SKIN ENCODE2012 SKIN.NHDFAD NHDF-Ad Adult Dermal Fibroblast Primary Cells NHDF-Ad_Adult_Dermal_Fibroblasts PrimaryCulture
E127 ENCODE2012 SKIN ENCODE2012 SKIN.NHEK NHEK-Epidermal Keratinocyte Primary Cells NHEK-Epidermal_Keratinocytes PrimaryCulture
E128 ENCODE2012 LUNG ENCODE2012 LNG.NHLF NHLF Lung Fibroblast Primary Cells NHLF_Lung_Fibroblasts PrimaryCulture
E129 ENCODE2012 BONE ENCODE2012 BONE.OSTEO Osteoblast Primary Cells Osteoblasts PrimaryCulture
E027 Epithelial BREAST Epithelial BRST.MYO Breast Myoepithelial Primary Cells Breast_Myoepithelial_Cells PrimaryCell
E028 Epithelial BREAST Epithelial BRST.HMEC.35 Breast variant Human Mammary Epithelial Cells (vHMEC) Breast_vHMEC PrimaryCulture
E055 Epithelial SKIN Epithelial SKIN.PEN.FRSK.FIB.01 Foreskin Fibroblast Primary Cells skin01 Penis_Foreskin_Fibroblast_Primary_Cells_skin01 PrimaryCulture
E056 Epithelial SKIN Epithelial SKIN.PEN.FRSK.FIB.02 Foreskin Fibroblast Primary Cells skin02 Penis_Foreskin_Fibroblast_Primary_Cells_skin02 PrimaryCulture
E057 Epithelial SKIN Epithelial SKIN.PEN.FRSK.KER.02 Foreskin Keratinocyte Primary Cells skin02 Penis_Foreskin_Keratinocyte_Primary_Cells_skin02 PrimaryCulture
E058 Epithelial SKIN Epithelial SKIN.PEN.FRSK.KER.03 Foreskin Keratinocyte Primary Cells skin03 Penis_Foreskin_Keratinocyte_Primary_Cells_skin03 PrimaryCulture
E059 Epithelial SKIN Epithelial SKIN.PEN.FRSK.MEL.01 Foreskin Melanocyte Primary Cells skin01 Penis_Foreskin_Melanocyte_Primary_Cells_skin01 PrimaryCulture
E061 Epithelial SKIN Epithelial SKIN.PEN.FRSK.MEL.03 Foreskin Melanocyte Primary Cells skin03 Penis_Foreskin_Melanocyte_Primary_Cells_skin03 PrimaryCulture
E004 ES-deriv ESC_DERIVED ES-deriv ESDR.H1.BMP4.MESO H1 BMP4 Derived Mesendoderm Cultured Cells H1_BMP4_Derived_Mesendoderm_Cultured_Cells ESCDerived
E005 ES-deriv ESC_DERIVED ES-deriv ESDR.H1.BMP4.TROP H1 BMP4 Derived Trophoblast Cultured Cells H1_BMP4_Derived_Trophoblast_Cultured_Cells ESCDerived
E006 ES-deriv ESC_DERIVED ES-deriv ESDR.H1.MSC H1 Derived Mesenchymal Stem Cells H1_Derived_Mesenchymal_Stem_Cells ESCDerived
E007 ES-deriv ESC_DERIVED ES-deriv ESDR.H1.NEUR.PROG H1 Derived Neuronal Progenitor Cultured Cells H1_Derived_Neuronal_Progenitor_Cultured_Cells ESCDerived
E009 ES-deriv ESC_DERIVED ES-deriv ESDR.H9.NEUR.PROG H9 Derived Neuronal Progenitor Cultured Cells H9_Derived_Neuronal_Progenitor_Cultured_Cells ESCDerived
E010 ES-deriv ESC_DERIVED ES-deriv ESDR.H9.NEUR H9 Derived Neuron Cultured Cells H9_Derived_Neuron_Cultured_Cells ESCDerived
E011 ES-deriv ESC_DERIVED ES-deriv ESDR.CD184.ENDO hESC Derived CD184+ Endoderm Cultured Cells hESC_Derived_CD184+_Endoderm_Cultured_Cells ESCDerived
E012 ES-deriv ESC_DERIVED ES-deriv ESDR.CD56.ECTO hESC Derived CD56+ Ectoderm Cultured Cells hESC_Derived_CD56+_Ectoderm_Cultured_Cells ESCDerived
E013 ES-deriv ESC_DERIVED ES-deriv ESDR.CD56.MESO hESC Derived CD56+ Mesoderm Cultured Cells hESC_Derived_CD56+_Mesoderm_Cultured_Cells ESCDerived
E001 ESC ESC ESC ESC.I3 ES-I3 Cells ES-I3_Cell_Line PrimaryCulture
E002 ESC ESC ESC ESC.WA7 ES-WA7 Cells ES-WA7_Cell_Line PrimaryCulture
E003 ESC ESC ESC ESC.H1 H1 Cells H1_Cell_Line PrimaryCulture
E008 ESC ESC ESC ESC.H9 H9 Cells H9_Cell_Line PrimaryCulture
E014 ESC ESC ESC ESC.HUES48 HUES48 Cells HUES48_Cell_Line PrimaryCulture
E015 ESC ESC ESC ESC.HUES6 HUES6 Cells HUES6_Cell_Line PrimaryCulture
E016 ESC ESC ESC ESC.HUES64 HUES64 Cells HUES64_Cell_Line PrimaryCulture
E024 ESC ESC ESC ESC.4STAR ES-UCSF4 Cells 4star PrimaryCulture
E065 Heart VASCULAR Heart VAS.AOR Aorta Aorta PrimaryTissue
E083 Heart HEART Heart HRT.FET Fetal Heart Fetal_Heart PrimaryTissue
E095 Heart HEART Heart HRT.VENT.L Left Ventricle Left_Ventricle PrimaryTissue
E104 Heart HEART Heart HRT.ATR.R Right Atrium Right_Atrium PrimaryTissue
E105 Heart HEART Heart HRT.VNT.R Right Ventricle Right_Ventricle PrimaryTissue
E029 HSC & B-cell BLOOD HSC_and_Bcell BLD.CD14.PC Primary monocytes from peripheral blood CD14_Primary_Cells PrimaryCell
E030 HSC & B-cell BLOOD HSC_and_Bcell BLD.CD15.PC Primary neutrophils from peripheral blood CD15_Primary_Cells PrimaryCell
E031 HSC & B-cell BLOOD HSC_and_Bcell BLD.CD19.CPC Primary B cells from cord blood CD19_Primary_Cells_Cord_BI PrimaryCell
E032 HSC & B-cell BLOOD HSC_and_Bcell BLD.CD19.PPC Primary B cells from peripheral blood CD19_Primary_Cells_Peripheral_UW PrimaryCell
E035 HSC & B-cell BLOOD HSC_and_Bcell BLD.CD34.PC Primary hematopoietic stem cells CD34_Primary_Cells PrimaryCell
E036 HSC & B-cell BLOOD HSC_and_Bcell BLD.CD34.CC Primary hematopoietic stem cells short term culture CD34_Cultured_Cells PrimaryCell
E046 HSC & B-cell BLOOD HSC_and_Bcell BLD.CD56.PC Primary Natural Killer cells from peripheral blood CD56_Primary_Cells PrimaryCell
E050 HSC & B-cell BLOOD HSC_and_Bcell BLD.MOB.CD34.PC.F Primary hematopoietic stem cells G-CSF-mobilized Female Mobilized_CD34_Primary_Cells_Female PrimaryCell
E051 HSC & B-cell BLOOD HSC_and_Bcell BLD.MOB.CD34.PC.M Primary hematopoietic stem cells G-CSF-mobilized Male Mobilized_CD34_Primary_Cells_Male PrimaryCell
E018 iPSC IPSC iPSC IPSC.15b iPS-15b Cells iPS-15b_Cell_Line PrimaryCulture
E019 iPSC IPSC iPSC IPSC.18 iPS-18 Cells iPS-18_Cell_Line PrimaryCulture
E020 iPSC IPSC iPSC IPSC.20B iPS-20b Cells iPS-20b_Cell_Line PrimaryCulture
E021 iPSC IPSC iPSC IPSC.DF.6.9 iPS DF 6.9 Cells iPS_DF_6.9_Cell_Line PrimaryCulture
E022 iPSC IPSC iPSC IPSC.DF.19.11 iPS DF 19.11 Cells iPS_DF_19.11_Cell_Line PrimaryCulture
E023 Mesench FAT Mesench FAT.MSC.DR.ADIP Mesenchymal Stem Cell Derived Adipocyte Cultured Cells Mesenchymal_Stem_Cell_Derived_Adipocyte_Cultured_Cells PrimaryCulture
E025 Mesench FAT Mesench FAT.ADIP.DR.MSC Adipose Derived Mesenchymal Stem Cell Cultured Cells Adipose_Derived_Mesenchymal_Stem_Cell_Cultured_Cells PrimaryCulture
E026 Mesench STROMAL_CONNECTIVE Mesench STRM.MRW.MSC Bone Marrow Derived Cultured Mesenchymal Stem Cells Bone_Marrow_Derived_Mesenchymal_Stem_Cell_Cultured_Cells PrimaryCulture
E049 Mesench STROMAL_CONNECTIVE Mesench STRM.CHON.MRW.DR.MSC Mesenchymal Stem Cell Derived Chondrocyte Cultured Cells Chondrocytes_from_Bone_Marrow_Derived_Mesenchymal_Stem_Cell_Cultured_Cells PrimaryCulture
E052 Myosat MUSCLE Muscle MUS.SAT Muscle Satellite Cultured Cells Muscle_Satellite_Cultured_Cells PrimaryCulture
E089 Muscle MUSCLE Muscle MUS.TRNK.FET Fetal Muscle Trunk Fetal_Muscle_Trunk PrimaryTissue
E090 Muscle MUSCLE_LEG Muscle MUS.LEG.FET Fetal Muscle Leg Fetal_Muscle_Leg PrimaryTissue
E100 Muscle MUSCLE Muscle MUS.PSOAS Psoas Muscle Psoas_Muscle PrimaryTissue
E107 Muscle MUSCLE Muscle MUS.SKLT.M Skeletal Muscle Male Skeletal_Muscle_Male PrimaryTissue
E108 Muscle MUSCLE Muscle MUS.SKLT.F Skeletal Muscle Female Skeletal_Muscle_Female PrimaryTissue
E017 IMR90 LUNG Other LNG.IMR90 IMR90 fetal lung fibroblasts Cell Line IMR90_Cell_Line CellLine
E063 Adipose FAT Other FAT.ADIP.NUC Adipose Nuclei Adipose_Nuclei PrimaryTissue
E066 Other LIVER Other LIV.ADLT Liver Adult_Liver PrimaryTissue
E076 Sm. Muscle GI_COLON Other GI.CLN.SM.MUS Colon Smooth Muscle Colon_Smooth_Muscle PrimaryTissue
E078 Sm. Muscle GI_DUODENUM Other GI.DUO.SM.MUS Duodenum Smooth Muscle Duodenum_Smooth_Muscle PrimaryTissue
E080 Other ADRENAL Other ADRL.GLND.FET Fetal Adrenal Gland Fetal_Adrenal_Gland PrimaryTissue
E086 Other KIDNEY Other KID.FET Fetal Kidney Fetal_Kidney PrimaryTissue
E087 Other PANCREAS Other PANC.ISLT Pancreatic Islets Pancreatic_Islets PrimaryTissue
E088 Other LUNG Other LNG.FET Fetal Lung Fetal_Lung PrimaryTissue
E091 Other PLACENTA Other PLCNT.FET Placenta Fetal_Placenta PrimaryTissue
E093 Thymus THYMUS Other THYM.FET Fetal Thymus Fetal_Thymus PrimaryTissue
E096 Other LUNG Other LNG Lung Lung PrimaryTissue
E097 Other OVARY Other OVRY Ovary Ovary PrimaryTissue
E098 Other PANCREAS Other PANC Pancreas Pancreas PrimaryTissue
E099 Other PLACENTA Other PLCNT.AMN Placenta Amnion Placenta_Amnion PrimaryTissue
E103 Sm. Muscle GI_RECTUM Other GI.RECT.SM.MUS Rectal Smooth Muscle Rectal_Smooth_Muscle PrimaryTissue
E111 Sm. Muscle GI_STOMACH Other GI.STMC.MUS Stomach Smooth Muscle Stomach_Smooth_Muscle PrimaryTissue
E112 Thymus THYMUS Other THYM Thymus Thymus PrimaryTissue
E113 Other SPLEEN Other SPLN Spleen Spleen PrimaryTissue

Session information

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] haploR_4.0.7
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.6.5       httr_1.4.7        RUnit_0.4.33      cli_3.6.3        
##  [5] knitr_1.48        rlang_1.1.4       xfun_0.49         jsonlite_1.8.9   
##  [9] glue_1.8.0        RJSONIO_1.3-1.9   DT_0.33           buildtools_1.0.0 
## [13] RCurl_1.98-1.16   plyr_1.8.9        htmltools_0.5.8.1 maketools_1.3.1  
## [17] XML_3.99-0.17     sys_3.4.3         sass_0.4.9        fansi_1.0.6      
## [21] rmarkdown_2.28    tibble_3.2.1      evaluate_1.0.1    jquerylib_0.1.4  
## [25] bitops_1.0-9      fastmap_1.2.0     yaml_2.3.10       lifecycle_1.0.4  
## [29] compiler_4.4.1    pkgconfig_2.0.3   Rcpp_1.0.13       htmlwidgets_1.6.4
## [33] digest_0.6.37     R6_2.5.1          utf8_1.2.4        curl_5.2.3       
## [37] pillar_1.9.0      magrittr_2.0.3    bslib_0.8.0       tools_4.4.1      
## [41] cachem_1.1.0
Boyle, A. P., E. L. Hong, M. Hariharan, Y. Cheng, M. A. Schaub, M. Kasowski, K. J. Karczewski, et al. 2012. “Annotation of Functional Variation in Personal Genomes Using RegulomeDB.” Genome Research 22 (9): 1790–97. https://doi.org/10.1101/gr.137323.112.
Ward, L. D., and M. Kellis. 2011. HaploReg: A Resource for Exploring Chromatin States, Conservation, and Regulatory Motif Alterations Within Sets of Genetically Linked Variants.” Nucleic Acids Research 40 (D1): D930–34. https://doi.org/10.1093/nar/gkr917.