Title: | Single Nucleotide Polymorphisms Linkage Disequilibrium Visualizations |
---|---|
Description: | Linkage disequilibrium visualizations of up to several hundreds of single nucleotide polymorphisms (SNPs), annotated with chromosomic positions and gene names. Two types of plots are available for small numbers of SNPs (<40) and for large numbers (tested up to 500). Both can be extended by combining other ggplots, e.g. association studies results, and functions enable to directly visualize the effect of SNP selection methods, as minor allele frequency filtering and TagSNP selection, with a second correlation heatmap. The SNPs correlations are computed on Genotype Data objects from the 'GWASTools' package using the 'SNPRelate' package, and the plots are customizable 'ggplot2' and 'gtable' objects and are annotated using the 'biomaRt' package. Usage is detailed in the vignette with example data and results from up to 500 SNPs of 1,200 scans are in Charlon T. (2019) <doi:10.13097/archive-ouverte/unige:161795>. |
Authors: | Thomas Charlon [aut, cre] , Karl Forner [aut], Alessandro Di Cara [aut], Jérôme Wojcik [aut] |
Maintainer: | Thomas Charlon <[email protected]> |
License: | GPL-3 |
Version: | 1.2.0 |
Built: | 2025-01-08 07:01:47 UTC |
Source: | CRAN |
Pipe an object forward into a function or call expression and update the 'lhs' object with the resulting value. Magrittr imported function, see details and examples in the magrittr package.
lhs |
An object which serves both as the initial value and as target. |
rhs |
a function call using the magrittr semantics. |
None, used to update the value of lhs.
Pipe an object forward into a function or call expression. Magrittr imported function, see details and examples in the magrittr package.
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Result of rhs applied to lhs, see details in magrittr package.
Expose the names in 'lhs' to the 'rhs' expression. Magrittr imported function, see details and examples in the magrittr package.
lhs |
A list, environment, or a data.frame. |
rhs |
An expression where the names in lhs is available. |
Result of rhs applied to one or several names of lhs.
Compute Chi-squared p-values
chisq_pvalues( m_data, response, adjust_method = "fdr", mlog10_transform = TRUE, n_cores = 1, ... )
chisq_pvalues( m_data, response, adjust_method = "fdr", mlog10_transform = TRUE, n_cores = 1, ... )
m_data |
Data matrix of observations by variables |
response |
Response vector of length the number of observations |
adjust_method |
Multiple testing p-value adjustment method. Passed to stats::p.adjust. 'fdr' by default. |
mlog10_transform |
Logical, transform p-values by minus log10. True by default. |
n_cores |
Number of cores |
... |
Passed to stats::chisq.test |
Chi-squared p-values
Compute Chi-squared p-values on a Genotype data object
chisq_pvalues_gdata( gdata, snp_idxs, response_column = "region", response_value = "Europe", threshold = 2, ... )
chisq_pvalues_gdata( gdata, snp_idxs, response_column = "region", response_value = "Europe", threshold = 2, ... )
gdata |
Genotype data object |
snp_idxs |
SNPs indexes |
response_column |
Response column in gdata scans annotations data frame |
response_value |
Response value. The response vector will be a logical, true if equal to the value, false otherwise. |
threshold |
Keep only associations greater than the threshold |
... |
Passed to chisq_pvalues |
SNPs annotation data frame, chi-squared p-values in column pvalues
The data set consist of 103 common (>5% minor allele frequency) SNPs genotyped in 129 trios from an European-derived population. These SNPs are in a 500-kb region on human chromosome 5q31 implicated as containing a genetic risk factor for Crohn disease.
Imported from the gap R package.
An example use of the data is with the following paper, Kelly M. Burkett, Celia M. T. Greenwood, BradMcNeney, Jinko Graham. Gene genealogies for genetic association mapping, with application to Crohn's disease. Fron Genet 2013, 4(260) doi: 10.3389/fgene.2013.00260
data(crohn)
data(crohn)
A data frame containing 387 rows and 212 columns
MJ Daly, JD Rioux, SF Schaffner, TJ Hudson, ES Lander (2001) High-resolution haplotype structure in the human genome Nature Genetics 29:229-232
Diamond ggplot layer for ggplot_ld
diamond_annots(data, x = "x", y = "y", color = "color", size = 0.5)
diamond_annots(data, x = "x", y = "y", color = "color", size = 0.5)
data |
Data frame of 3 columns defining the diamonds |
x |
Name of the column for horizontal positions |
y |
Name of the column for vertical positions |
color |
Name of the column for color values |
size |
Radius of the diamonds |
gglayers
Fetch allele 1 (default object)
## Default S3 method: fetch_allele1(obj, snps_idx)
## Default S3 method: fetch_allele1(obj, snps_idx)
obj |
Default object |
snps_idx |
SNPs indexes |
Fetch allele 1 (GdsGenotypeReader object)
## S3 method for class 'GdsGenotypeReader' fetch_allele1(obj, snps_idx)
## S3 method for class 'GdsGenotypeReader' fetch_allele1(obj, snps_idx)
obj |
GenotypeData object |
snps_idx |
SNPs indexes |
Allele 1
Fetch allele 1 (GenotypeData object)
## S3 method for class 'GenotypeData' fetch_allele1(obj, ...)
## S3 method for class 'GenotypeData' fetch_allele1(obj, ...)
obj |
GenotypeData object |
... |
Passed to getAlleleA |
Allele 1
Fetch allele 1 (GenotypeDataSubset object)
## S3 method for class 'GenotypeDataSubset' fetch_allele1(obj, snps_idx)
## S3 method for class 'GenotypeDataSubset' fetch_allele1(obj, snps_idx)
obj |
GenotypeDataSubset object |
snps_idx |
SNPs indexes |
Allele 1
Fetch allele 2 (default object)
## Default S3 method: fetch_allele2(obj, snps_idx)
## Default S3 method: fetch_allele2(obj, snps_idx)
obj |
Default object |
snps_idx |
SNPs indexes |
Fetch allele 2 (GdsGenotypeReader object)
## S3 method for class 'GdsGenotypeReader' fetch_allele2(obj, snps_idx)
## S3 method for class 'GdsGenotypeReader' fetch_allele2(obj, snps_idx)
obj |
GenotypeData object |
snps_idx |
SNPs indexes |
Allele 2
Fetch allele 2 (GenotypeData object)
## S3 method for class 'GenotypeData' fetch_allele2(obj, ...)
## S3 method for class 'GenotypeData' fetch_allele2(obj, ...)
obj |
GenotypeData object |
... |
Passed to getAlleleB |
Allele 2
Fetch allele 1 (GenotypeDataSubset object)
## S3 method for class 'GenotypeDataSubset' fetch_allele2(obj, snps_idx)
## S3 method for class 'GenotypeDataSubset' fetch_allele2(obj, snps_idx)
obj |
GenotypeDataSubset object |
snps_idx |
SNPs indexes |
Allele 2
Fetch GDS (default)
## Default S3 method: fetch_gds(obj, ...)
## Default S3 method: fetch_gds(obj, ...)
obj |
Default object |
... |
Not passed |
Fetch GDS (GdsGenotypeReader)
## S3 method for class 'GdsGenotypeReader' fetch_gds(obj, ...)
## S3 method for class 'GdsGenotypeReader' fetch_gds(obj, ...)
obj |
GdsGenotypeReader object |
... |
Not passed |
S4 slot 'handler' of obj
Fetch GDS (GenotypeData)
## S3 method for class 'GenotypeData' fetch_gds(obj, ...)
## S3 method for class 'GenotypeData' fetch_gds(obj, ...)
obj |
GenotypeData object |
... |
Not passed |
fetch_gds output on S4 slot 'data' of obj
Fetch GDS (GenotypeDataSubset)
## S3 method for class 'GenotypeDataSubset' fetch_gds(obj, ...)
## S3 method for class 'GenotypeDataSubset' fetch_gds(obj, ...)
obj |
GenotypeDataSubset object |
... |
Not passed |
Add biomaRt gene annotations to Genotype Data object.
gdata_add_gene_annots( gdata, snp_idxs, rsids_colname = "probe_id", biomart_metadb = get_biomart_metadb() )
gdata_add_gene_annots( gdata, snp_idxs, rsids_colname = "probe_id", biomart_metadb = get_biomart_metadb() )
gdata |
Genotype Data object |
snp_idxs |
SNP indexes |
rsids_colname |
Column of SNP annotation data frame with rs identifiers |
biomart_metadb |
List with slots snpmart and ensembl, corresponding to the biomart databases to query for SNP identifiers and gene names, respectively. See get_biomart_metadb function. |
Genotype Data object
Add ancestry informative markers gene annotations to Genotype Data object. Convenience function for the vignette to avoid querying biomaRt on build.
gdata_add_gene_annots_aim_example(gdata, aim_idxs)
gdata_add_gene_annots_aim_example(gdata, aim_idxs)
gdata |
Genotype Data object |
aim_idxs |
AIM indexes in the example Genotype data object |
Genotype Data object
Add HLA-DR gene annotations to Genotype Data object. Convenience function for the vignette to avoid querying biomaRt on build.
gdata_add_gene_annots_hladr_example(gdata, hla_dr_idxs)
gdata_add_gene_annots_hladr_example(gdata, hla_dr_idxs)
gdata |
Genotype Data object |
hla_dr_idxs |
HLA-DR indexes in the example Genotype data object |
Genotype Data object
Get scans annotations from a Genotype Data object or a subset.
gdata_scans_annots(gdata, scan_ids)
gdata_scans_annots(gdata, scan_ids)
gdata |
Genotype Data object |
scan_ids |
Scan identifiers to subset |
Scans annotations data frame
Get SNPs annotations from a Genotype Data object or a subset.
gdata_snps_annots(gdata, snp_ids = NULL)
gdata_snps_annots(gdata, snp_ids = NULL)
gdata |
Genotype Data object |
snp_ids |
SNP identifiers to subset |
SNP annotation data frame
To query gene names of SNPs, it is necessary to retrieve two objects using biomaRt::useMart. First, the object required to map SNP rs identifiers to ENSEMBL identifiers. Second, the object required to map ENSEMBL identifiers to common gene names. The function returns a list of two slots named snpmart and ensembl corresponding to each one, respectively. Once obtained it is saved to a local file.
get_biomart_metadb( filepath = extdata_filepath("bmart_meta.rds"), host = "https://grch37.ensembl.org" )
get_biomart_metadb( filepath = extdata_filepath("bmart_meta.rds"), host = "https://grch37.ensembl.org" )
filepath |
Path to save the biomaRt objects |
host |
BiomaRt Ensembl host, by default https://grch37.ensembl.org |
List of slots snpmart and ensembl as detailed above
Get scans annotations (GenotypeData object)
## S3 method for class 'GenotypeData' get_scan_annot(obj, ...)
## S3 method for class 'GenotypeData' get_scan_annot(obj, ...)
obj |
GenotypeData object |
... |
Not passed |
Data frame
Get scans annotations (GenotypeDataSubset object)
## S3 method for class 'GenotypeDataSubset' get_scan_annot(obj, ...)
## S3 method for class 'GenotypeDataSubset' get_scan_annot(obj, ...)
obj |
GenotypeDataSubset object |
... |
Not passed |
Data frame
Get SNPs annotations (GenotypeData object)
## S3 method for class 'GenotypeData' get_snp_annot(obj, ...)
## S3 method for class 'GenotypeData' get_snp_annot(obj, ...)
obj |
GenotypeData object |
... |
Not passed |
Data frame
Get SNPs annotations (GenotypeDataSubset object)
## S3 method for class 'GenotypeDataSubset' get_snp_annot(obj, ...)
## S3 method for class 'GenotypeDataSubset' get_snp_annot(obj, ...)
obj |
GenotypeDataSubset object |
... |
Not passed |
Data frame
Get SNPs associations ggplot, either as points or as a linked area. Optionally add labels to most associated points using ggrepel.
ggplot_associations( df_snp, pvalue_colname = "pvalues", labels_colname = "probe_id", n_labels = 10, nudge = c(0, 1), linked_area = FALSE, byindex = linked_area, colors = if (linked_area) snp_position_colors(nrow(df_snp)) else "black" )
ggplot_associations( df_snp, pvalue_colname = "pvalues", labels_colname = "probe_id", n_labels = 10, nudge = c(0, 1), linked_area = FALSE, byindex = linked_area, colors = if (linked_area) snp_position_colors(nrow(df_snp)) else "black" )
df_snp |
SNP annotation data frame with columns chromosome, position, and as specified by parameters pvalue_colname and optionally labels_colname. |
pvalue_colname |
Column name of df_snp with association values |
labels_colname |
Optional column name of df_snp with labels. Set to NULL to remove. |
n_labels |
Number of labels of most associated points to display. |
nudge |
Nudge parameter passed to ggrepel::geom_label_repel. |
linked_area |
Add a linked area to associations points, default FALSE |
byindex |
Display by SNP index or chromosomic position (default) |
colors |
Colors of SNPs |
ggplot
Display SNP r2 correlations using points or diamonds with text.
ggplot_ld( df_ld, diamonds = length(unique(df_ld$SNP_A)) < 40, point_size = 120/sqrt(nrow(df_ld)), reverse = FALSE, reindex = TRUE )
ggplot_ld( df_ld, diamonds = length(unique(df_ld$SNP_A)) < 40, point_size = 120/sqrt(nrow(df_ld)), reverse = FALSE, reindex = TRUE )
df_ld |
Data frame with columns SNP_A, SNP_B, and R2. As returned by the snprelate_ld function. |
diamonds |
Should the values be displayed as diamonds or points ? Default is TRUE for less than 40 SNPs. |
point_size |
Size for geom_point. Ignored if diamonds is TRUE. |
reverse |
Reverse the display (horizontal symmetry) |
reindex |
If FALSE, SNPs are positionned following their IDs |
ggplot
Get SNPs position ggplot with mappings to combine with other ggplots. Optionally add labels and an upper subset.
ggplot_snp_pos( df_snp, upper_subset = NULL, labels_colname = NULL, colors = snp_position_colors(nrow(df_snp)) )
ggplot_snp_pos( df_snp, upper_subset = NULL, labels_colname = NULL, colors = snp_position_colors(nrow(df_snp)) )
df_snp |
SNP annotation data frame with a column named position and, if specified, one named as the labels_colname parameter. |
upper_subset |
Subset of df_snp for the positions on the upper side |
labels_colname |
Optional column name of df_snp to use as SNP labels. |
colors |
Colors for each SNP |
ggplot
Creates a gtable of linkage disequilibrium and chromosomic positions ggplots. A biplot_subset parameter is available to add a second linkage disequibrium ggplot to visualize the effect of a SNP selection.
gtable_ld( df_ld, df_snp, biplot_subset = NULL, labels_colname = NULL, diamonds = length(unique(df_ld$SNP_A)) < 40, point_size = ifelse(is.null(biplot_subset), 120, 80)/sqrt(nrow(df_ld)), title = "", title_biplot = "", ... )
gtable_ld( df_ld, df_snp, biplot_subset = NULL, labels_colname = NULL, diamonds = length(unique(df_ld$SNP_A)) < 40, point_size = ifelse(is.null(biplot_subset), 120, 80)/sqrt(nrow(df_ld)), title = "", title_biplot = "", ... )
df_ld |
Data frame returned by snprelate_ld |
df_snp |
SNP annotations with columns snpID and position |
biplot_subset |
SNP indexes of the subset for the second ld plot |
labels_colname |
Column name of df_snp to use as SNP labels |
diamonds |
Display the values as diamonds or as points Default is TRUE for less than 40 SNPs. |
point_size |
Size for geom_point. Ignored if diamonds is TRUE. |
title |
Plot title |
title_biplot |
Optional biplot title |
... |
Passed to ggplot_ld |
gtable of ggplots
library(snplinkage) gds_path <- save_hgdp_as_gds() gdata <- load_gds_as_genotype_data(gds_path) qc <- snprelate_qc(gdata, tagsnp = .99) snp_idxs_8p23 <- select_region_idxs(qc$gdata, chromosome = 8, position_min = 11e6, position_max = 12e6) df_ld <- snprelate_ld(qc$gdata, snps_idx = snp_idxs_8p23, quiet = TRUE) plt <- gtable_ld(df_ld, df_snp = gdata_snps_annots(qc$gdata))
library(snplinkage) gds_path <- save_hgdp_as_gds() gdata <- load_gds_as_genotype_data(gds_path) qc <- snprelate_qc(gdata, tagsnp = .99) snp_idxs_8p23 <- select_region_idxs(qc$gdata, chromosome = 8, position_min = 11e6, position_max = 12e6) df_ld <- snprelate_ld(qc$gdata, snps_idx = snp_idxs_8p23, quiet = TRUE) plt <- gtable_ld(df_ld, df_snp = gdata_snps_annots(qc$gdata))
Creates a gtable of a linkage disequilibrium, chromosomic positions, and association scores ggplots.
gtable_ld_associations( df_assocs, df_ld, pvalue_colname = "pvalues", labels_colname = "probe_id", n_labels = 5, diamonds = nrow(df_assocs) <= 40, linked_area = diamonds, point_size = 150/nrow(df_assocs), colors = snp_position_colors(nrow(df_assocs)), ... )
gtable_ld_associations( df_assocs, df_ld, pvalue_colname = "pvalues", labels_colname = "probe_id", n_labels = 5, diamonds = nrow(df_assocs) <= 40, linked_area = diamonds, point_size = 150/nrow(df_assocs), colors = snp_position_colors(nrow(df_assocs)), ... )
df_assocs |
SNP annotation data frame with columns chromosome, position, and as specified by parameters pvalue_colname and optionally labels_colname. |
df_ld |
Data frame with columns SNP_A, SNP_B, and R2, as returned by the snprelate_ld function. |
pvalue_colname |
Column name of df_snp with association values |
labels_colname |
Optional column name of df_snp with labels. Set NULL to remove labels. |
n_labels |
Number of labels of most associated SNPs to display. |
diamonds |
Should the values be displayed as diamonds or points ? Default is TRUE for up to 40 SNPs. |
linked_area |
Add a linked area to associations points. Default same as diamonds. |
point_size |
Point size for ggplot_ld, ignored if diamonds is TRUE. |
colors |
Colors of SNPs |
... |
Passed to ggplot_associations |
gtable
Build gtable by combining ggplots
gtable_ld_associations_combine(ggplots, diamonds)
gtable_ld_associations_combine(ggplots, diamonds)
ggplots |
List of ggplots |
diamonds |
Does the LD visualization use diamond-type layout |
gtable of ggplots
library(snplinkage) # example rnaseq data frame, 20 variables of 20 patients m_rna = matrix(runif(20 ^ 2), nrow = 20) # pair-wise correlation matrix m_ld = cor(m_rna) ^ 2 # keep only upper triangle and reshape to data frame m_ld[lower.tri(m_ld, diag = TRUE)] = NA df_ld = reshape2::melt(m_ld) |> na.omit() # rename for SNPLinkage names(df_ld) = c('SNP_A', 'SNP_B', 'R2') # visualize with ggplot_ld gg_ld = ggplot_ld(df_ld) # let's imagine the 20 variables came from 3 physically close regions positions = c(runif(7, 10e5, 15e5), runif(6, 25e5, 30e5), runif(7, 45e5, 50e5)) |> sort() # build the dataframe df_snp_pos = data.frame(position = positions) df_snp_pos$label = c(rep('HLA-A', 7), rep('HLA-B', 6), rep('HLA-C', 7)) gg_pos_biplot = ggplot_snp_pos(df_snp_pos, labels_colname = 'label', upper_subset = TRUE) # let's assume HLA-B is more associated with the outcome than the other genes pvalues = c(runif(7, 1e-3, 1e-2), runif(6, 1e-8, 1e-6), runif(7, 1e-3, 1e-2)) log10_pvals = -log10(pvalues) # we can reuse the df_snp_pos object df_snp_pos$pvalues = log10_pvals # add the chromosome column df_snp_pos$chromosome = 6 gg_assocs = ggplot_associations(df_snp_pos, labels_colname = 'label', linked_area = TRUE, nudge = c(0, 0.5), n_labels = 12) l_ggs = list(pos = gg_pos_biplot, ld = gg_ld, pval = gg_assocs) gt_ld = gtable_ld_associations_combine(l_ggs, diamonds = TRUE) grid::grid.draw(gt_ld)
library(snplinkage) # example rnaseq data frame, 20 variables of 20 patients m_rna = matrix(runif(20 ^ 2), nrow = 20) # pair-wise correlation matrix m_ld = cor(m_rna) ^ 2 # keep only upper triangle and reshape to data frame m_ld[lower.tri(m_ld, diag = TRUE)] = NA df_ld = reshape2::melt(m_ld) |> na.omit() # rename for SNPLinkage names(df_ld) = c('SNP_A', 'SNP_B', 'R2') # visualize with ggplot_ld gg_ld = ggplot_ld(df_ld) # let's imagine the 20 variables came from 3 physically close regions positions = c(runif(7, 10e5, 15e5), runif(6, 25e5, 30e5), runif(7, 45e5, 50e5)) |> sort() # build the dataframe df_snp_pos = data.frame(position = positions) df_snp_pos$label = c(rep('HLA-A', 7), rep('HLA-B', 6), rep('HLA-C', 7)) gg_pos_biplot = ggplot_snp_pos(df_snp_pos, labels_colname = 'label', upper_subset = TRUE) # let's assume HLA-B is more associated with the outcome than the other genes pvalues = c(runif(7, 1e-3, 1e-2), runif(6, 1e-8, 1e-6), runif(7, 1e-3, 1e-2)) log10_pvals = -log10(pvalues) # we can reuse the df_snp_pos object df_snp_pos$pvalues = log10_pvals # add the chromosome column df_snp_pos$chromosome = 6 gg_assocs = ggplot_associations(df_snp_pos, labels_colname = 'label', linked_area = TRUE, nudge = c(0, 0.5), n_labels = 12) l_ggs = list(pos = gg_pos_biplot, ld = gg_ld, pval = gg_assocs) gt_ld = gtable_ld_associations_combine(l_ggs, diamonds = TRUE) grid::grid.draw(gt_ld)
Compute linkage disequilibrium using snprelate_ld on the set of SNPs in the associations data frame and call gtable_ld_associations. Creates a gtable of a linkage disequilibrium, chromosomic positions, and association scores ggplots.
gtable_ld_associations_gdata( df_assocs, gdata, pvalue_colname = "pvalues", labels_colname = "probe_id", diamonds = nrow(df_assocs) <= 40, window = 15, ... )
gtable_ld_associations_gdata( df_assocs, gdata, pvalue_colname = "pvalues", labels_colname = "probe_id", diamonds = nrow(df_assocs) <= 40, window = 15, ... )
df_assocs |
SNP annotation data frame with columns chromosome, position, and as specified by parameters pvalue_colname and optionally labels_colname. |
gdata |
GenotypeData object, as returned by load_gds_as_genotype_data |
pvalue_colname |
Column name of df_snp with association values |
labels_colname |
Optional column name of df_snp with labels. Set NULL to remove labels. |
diamonds |
Should the values be displayed as diamonds or points ? Default is TRUE for up to 40 SNPs. |
window |
Window size for snprelate_ld. Forced to the total number of SNPs if diamonds is FALSE |
... |
Passed to gtable_ld_associations |
gtable
library(snplinkage) gds_path <- save_hgdp_as_gds() gdata <- load_gds_as_genotype_data(gds_path) qc <- snprelate_qc(gdata, tagsnp = .99) snp_idxs_mhc <- select_region_idxs(qc$gdata, chromosome = 6, position_min = 29e6, position_max = 33e6) df_assocs <- chisq_pvalues_gdata(qc$gdata, snp_idxs_mhc) df_top_aim <- subset(df_assocs, rank(-pvalues, ties.method = 'first') <= 20) #qc$gdata <- gdata_add_gene_annots(qc$gdata, rownames(df_top_aim)) qc$gdata <- gdata_add_gene_annots_aim_example(qc$gdata, rownames(df_top_aim)) plt <- gtable_ld_associations_gdata(df_top_aim, qc$gdata, labels_colname = 'gene')
library(snplinkage) gds_path <- save_hgdp_as_gds() gdata <- load_gds_as_genotype_data(gds_path) qc <- snprelate_qc(gdata, tagsnp = .99) snp_idxs_mhc <- select_region_idxs(qc$gdata, chromosome = 6, position_min = 29e6, position_max = 33e6) df_assocs <- chisq_pvalues_gdata(qc$gdata, snp_idxs_mhc) df_top_aim <- subset(df_assocs, rank(-pvalues, ties.method = 'first') <= 20) #qc$gdata <- gdata_add_gene_annots(qc$gdata, rownames(df_top_aim)) qc$gdata <- gdata_add_gene_annots_aim_example(qc$gdata, rownames(df_top_aim)) plt <- gtable_ld_associations_gdata(df_top_aim, qc$gdata, labels_colname = 'gene')
Compute linkage disequilibrium using snprelate_ld on a set of SNP indexes and call gtable_ld. Two parameters are available to compute and compare minor allele frequency filtering and TagSNP selection by displaying two LD plots with their positions in the center. The maf and r2 parameters are used similarly and as follows: - compare baseline with MAF 5 gtable_ld(gdata, snps_idx, maf = 0.05) - compare baseline with TagSNP r2 = 0.8 gtable_ld(gdata, snps_idx, r2 = 0.8) - compare 5 gtable_ld(gdata, snps_idx, maf = c(0.05, 0.05), r2 = 0.8) - compare MAF 5 gtable_ld(gdata, snps_idx, maf = c(0.05, 0.1), r2 = c(0.8, 0.6))
gtable_ld_gdata( gdata, snps_idx, maf = NULL, r2 = NULL, diamonds = length(snps_idx) < 40, window = 15, autotitle = TRUE, autotitle_bp = TRUE, double_title = FALSE, ... )
gtable_ld_gdata( gdata, snps_idx, maf = NULL, r2 = NULL, diamonds = length(snps_idx) < 40, window = 15, autotitle = TRUE, autotitle_bp = TRUE, double_title = FALSE, ... )
gdata |
GenotypeData object returned by load_gds_as_genotype_data |
snps_idx |
SNPs indexes to select |
maf |
Minor allele frequency threshold(s), see description |
r2 |
TagSNP r2 threshold(s), see description |
diamonds |
Display the values as diamonds or as points Default is TRUE for less than 40 SNPs. |
window |
Window size for snprelate_ld. Forced to the total number of SNPs if diamonds is FALSE |
autotitle |
Set title to feature selection method(s), number of SNPs and chromosome |
autotitle_bp |
Set biplot title to feature selection method(s), number of SNPs and chromosome |
double_title |
Logical, if false (default) keep only biplot title |
... |
Passed to gtable_ld |
gtable of ggplots
library(snplinkage) gds_path <- save_hgdp_as_gds() gdata <- load_gds_as_genotype_data(gds_path) qc <- snprelate_qc(gdata, tagsnp = .99) snp_idxs_1p13_large <- select_region_idxs(qc$gdata, chromosome = 1, position_min = 114e6, n_snps = 100) plt <- gtable_ld_gdata(qc$gdata, snp_idxs_1p13_large)
library(snplinkage) gds_path <- save_hgdp_as_gds() gdata <- load_gds_as_genotype_data(gds_path) qc <- snprelate_qc(gdata, tagsnp = .99) snp_idxs_1p13_large <- select_region_idxs(qc$gdata, chromosome = 1, position_min = 114e6, n_snps = 100) plt <- gtable_ld_gdata(qc$gdata, snp_idxs_1p13_large)
Build gtable by combining ggplots
gtable_ld_grobs(plots, labels_colname, title)
gtable_ld_grobs(plots, labels_colname, title)
plots |
List of ggplots |
labels_colname |
Does the SNP position plot contain labels |
title |
Title text string |
gtable of ggplots
library(snplinkage) # example rnaseq data frame, 20 variables of 20 patients m_rna = matrix(runif(20 ^ 2), nrow = 20) # pair-wise correlation matrix m_ld = cor(m_rna) ^ 2 # keep only upper triangle and reshape to data frame m_ld[lower.tri(m_ld, diag = TRUE)] = NA df_ld = reshape2::melt(m_ld) |> na.omit() # rename for SNPLinkage names(df_ld) = c('SNP_A', 'SNP_B', 'R2') # visualize with ggplot_ld gg_ld = ggplot_ld(df_ld) # let's imagine the 20 variables came from 3 physically close regions positions = c(runif(7, 10e5, 15e5), runif(6, 25e5, 30e5), runif(7, 45e5, 50e5)) |> sort() # build the dataframe df_snp_pos = data.frame(position = positions) df_snp_pos$label = c(rep('HLA-A', 7), rep('HLA-B', 6), rep('HLA-C', 7)) gg_snp_pos = ggplot_snp_pos(df_snp_pos, labels_colname = 'label') l_ggs = list(snp_pos = gg_snp_pos, ld = gg_ld) gt_ld = gtable_ld_grobs(l_ggs, labels_colname = TRUE, title = 'RNASeq correlations') grid::grid.draw(gt_ld)
library(snplinkage) # example rnaseq data frame, 20 variables of 20 patients m_rna = matrix(runif(20 ^ 2), nrow = 20) # pair-wise correlation matrix m_ld = cor(m_rna) ^ 2 # keep only upper triangle and reshape to data frame m_ld[lower.tri(m_ld, diag = TRUE)] = NA df_ld = reshape2::melt(m_ld) |> na.omit() # rename for SNPLinkage names(df_ld) = c('SNP_A', 'SNP_B', 'R2') # visualize with ggplot_ld gg_ld = ggplot_ld(df_ld) # let's imagine the 20 variables came from 3 physically close regions positions = c(runif(7, 10e5, 15e5), runif(6, 25e5, 30e5), runif(7, 45e5, 50e5)) |> sort() # build the dataframe df_snp_pos = data.frame(position = positions) df_snp_pos$label = c(rep('HLA-A', 7), rep('HLA-B', 6), rep('HLA-C', 7)) gg_snp_pos = ggplot_snp_pos(df_snp_pos, labels_colname = 'label') l_ggs = list(snp_pos = gg_snp_pos, ld = gg_ld) gt_ld = gtable_ld_grobs(l_ggs, labels_colname = TRUE, title = 'RNASeq correlations') grid::grid.draw(gt_ld)
Is SNP first dimension (default)
## Default S3 method: is_snp_first_dim(obj, ...)
## Default S3 method: is_snp_first_dim(obj, ...)
obj |
Default object |
... |
Not passed |
NA
Is SNP first dimension (GDS object)
## S3 method for class 'gds.class' is_snp_first_dim(obj, ...)
## S3 method for class 'gds.class' is_snp_first_dim(obj, ...)
obj |
GDS object |
... |
Not passed |
Logical, TRUE if SNP is first dimension
Is SNP first dimension (GdsGenotypeReader object)
## S3 method for class 'GdsGenotypeReader' is_snp_first_dim(obj, ...)
## S3 method for class 'GdsGenotypeReader' is_snp_first_dim(obj, ...)
obj |
GdsGenotypeReader object |
... |
Not passed |
is_snp_first_dim output on S4 slot 'handler'
Is SNP first dimension (GenotypeData object)
## S3 method for class 'GenotypeData' is_snp_first_dim(obj, ...)
## S3 method for class 'GenotypeData' is_snp_first_dim(obj, ...)
obj |
Genotype data object |
... |
Not passed |
is_snp_first_dim output on S4 slot 'data'
Is SNP first dimension (MatrixGenotypeReader object)
## S3 method for class 'MatrixGenotypeReader' is_snp_first_dim(obj, ...)
## S3 method for class 'MatrixGenotypeReader' is_snp_first_dim(obj, ...)
obj |
MatrixGenotypeReader object |
... |
Not passed |
TRUE
Is SNP first dimension (NcdfGenotypeReader object)
## S3 method for class 'NcdfGenotypeReader' is_snp_first_dim(obj, ...)
## S3 method for class 'NcdfGenotypeReader' is_snp_first_dim(obj, ...)
obj |
NcdfGenotypeReader object |
... |
Not passed |
TRUE
Open a connection to a snpgds file (cf. SNPRelate package) as a Genotype Data object.
load_gds_as_genotype_data( gds_file, read_snp_annot = TRUE, read_scan_annot = TRUE )
load_gds_as_genotype_data( gds_file, read_snp_annot = TRUE, read_scan_annot = TRUE )
gds_file |
Path of snpgds file |
read_snp_annot |
Read the SNPs' annotations |
read_scan_annot |
Read the scans' annotations |
Genotype Data object
library(snplinkage) gds_path <- save_hgdp_as_gds() gdata <- load_gds_as_genotype_data(gds_path)
library(snplinkage) gds_path <- save_hgdp_as_gds() gdata <- load_gds_as_genotype_data(gds_path)
Separate a matrix in a list of matrices of length the number of cores and apply a function on the columns in parallel
parallel_apply(m_data, apply_fun, n_cores = 1, ...)
parallel_apply(m_data, apply_fun, n_cores = 1, ...)
m_data |
Data matrix |
apply_fun |
Function to apply |
n_cores |
Number of cores |
... |
Passed to apply_fun |
apply_fun return
Print information about quality control performed by the snprelate_qc function.
print_qc_as_tex_table( gdata_qc, label = "qc", caption = paste("Quality control and feature selection of the subset of the", "human genome diversity project dataset.") )
print_qc_as_tex_table( gdata_qc, label = "qc", caption = paste("Quality control and feature selection of the subset of the", "human genome diversity project dataset.") )
gdata_qc |
Genotype Data object object returned by snprelate_qc |
label |
Label of the Tex table |
caption |
Caption of the Tex table |
Prints knitr::kable object using cat
Save the HGDP SNP data text file as a Genomic Data Structure file
save_hgdp_as_gds(paths = hgdp_filepaths(), outpath = tempfile(), ...)
save_hgdp_as_gds(paths = hgdp_filepaths(), outpath = tempfile(), ...)
paths |
Paths of the zip, txt, and gds files |
outpath |
Output GDS file path |
... |
Passed to save_genotype_data_as_gds |
Path of the saved gds file
Select SNP indexes corresponding to a specific genomic region.
select_region_idxs( gdata, chromosome, position_min = -Inf, position_max = Inf, n_snps = 0, offset = 0 )
select_region_idxs( gdata, chromosome, position_min = -Inf, position_max = Inf, n_snps = 0, offset = 0 )
gdata |
Genotype Data object |
chromosome |
Chromosome to select |
position_min |
Minimum base pair position to select |
position_max |
Maximum base pair position to select |
n_snps |
Maximum number of SNPs to return |
offset |
Number of SNPs to offset |
SNP indexes of Genotype Data object
Wrapper over SNPRelate::snpgdsSNPRateFreq
snprelate_allele_frequencies( gdata, snps_idx = NULL, scans_idx = NULL, quiet = FALSE )
snprelate_allele_frequencies( gdata, snps_idx = NULL, scans_idx = NULL, quiet = FALSE )
gdata |
A GenotypeData object |
snps_idx |
Vector of snps indices |
scans_idx |
Vector of scans indices |
quiet |
Whether to be quiet |
A data frame of snps_idx, snps_ids, allele1, allele2, maf, missing where allele1 and allele2 are the rates of the alleles, and maf the minimum of the 2. Missing is the missing rate. N.B: the allele rates are computed on the non missing genotypes, i.e. their sum equals 1.
Wrapper for snpgdsLDMat to compute r2
snprelate_ld( gdata, window_size = 0, min_r2 = 0, snps_idx = NULL, scans_idx = NULL, threads = 1, quiet = FALSE )
snprelate_ld( gdata, window_size = 0, min_r2 = 0, snps_idx = NULL, scans_idx = NULL, threads = 1, quiet = FALSE )
gdata |
A GenotypeData object |
window_size |
Max number of SNPs in LD window, 0 for no window |
min_r2 |
Minimum r2 value to report |
snps_idx |
Indices of snps to use |
scans_idx |
Indices of scans to use |
threads |
The number of threads to use |
quiet |
Whether to be quiet |
A data frame with columns SNP_A, SNP_B, R2 for r2 >= min_r2
The tagged snp set is (by sliding window) representative and strongly not redundant.
snprelate_ld_select( gdata, window_length = 500L, min_r2, window_size = NA, snps_idx = NULL, scans_idx = NULL, remove.monosnp = FALSE, autosome.only = FALSE, method = "r", threads = 1, quiet = FALSE, ... )
snprelate_ld_select( gdata, window_length = 500L, min_r2, window_size = NA, snps_idx = NULL, scans_idx = NULL, remove.monosnp = FALSE, autosome.only = FALSE, method = "r", threads = 1, quiet = FALSE, ... )
gdata |
A GenotypeData object |
window_length |
Max length in kb of the window |
min_r2 |
Minimum r2 value to report |
window_size |
Max number of SNPs in LD window |
snps_idx |
Indices of snps to use |
scans_idx |
Indices of scans to use |
remove.monosnp |
if TRUE, remove monomorphic SNPs |
autosome.only |
if |
method |
"composite", "r", "dprime", "corr", see details |
threads |
The number of threads to use, currently ignored |
quiet |
Whether to be quiet |
... |
Forwarded to SNPRelate::snpgdsLDpruning |
A list of SNP IDs stratified by chromosomes.
Quality control using SNPRelate functions.
snprelate_qc( gdata, samples_nas = 0.03, ibs = 0.99, keep_ids = NULL, snps_nas = 0.01, maf = 0.05, tagsnp = 0.8, n_cores = 1 )
snprelate_qc( gdata, samples_nas = 0.03, ibs = 0.99, keep_ids = NULL, snps_nas = 0.01, maf = 0.05, tagsnp = 0.8, n_cores = 1 )
gdata |
Genotype data object |
samples_nas |
NA threshold for samples, default 3 pct |
ibs |
Samples identity by state threshold, default 99 pct |
keep_ids |
Samples ids to keep even if IBS is higher than threshold. Used for monozygotic twins. |
snps_nas |
NA threshold for SNPs, default 1 pct |
maf |
Minor allele frequency threshold, default 5 pct |
tagsnp |
TagSNP r2 correlation threshold, default 0.8 |
n_cores |
Number of cores |
List of gdata, Genotype data object, and df_qc, QC info data frame
library(snplinkage) gds_path <- save_hgdp_as_gds() gdata <- load_gds_as_genotype_data(gds_path) qc <- snprelate_qc(gdata, tagsnp = .99)
library(snplinkage) gds_path <- save_hgdp_as_gds() gdata <- load_gds_as_genotype_data(gds_path) qc <- snprelate_qc(gdata, tagsnp = .99)