| Title: | S4 Tools for Reading and Organizing Genetic Data |
|---|---|
| Description: | Provides an integrated suite of tools for handling single nucleotide polymorphism (SNP) genotype data in large-scale genetic studies. Supports importing and merging genotype files, performing quality control on SNP markers and samples, and preparing data for downstream analyses using popular software such as 'FImpute' and 'PLINK'. Offers S4 classes and methods to efficiently encapsulate SNP data, along with utilities for generating genotype summary statistics and visualization. Additional functionalities include anticlustering approaches for batch effect control, automated script generation for external software, and streamlined workflows for large datasets commonly encountered in animal and plant breeding programs. Designed to facilitate reproducible and scalable SNP data analyses in quantitative and statistical genetics. |
| Authors: | VinÃcius Junqueira [aut, cre], Roberto Higa [aut], Fernando Flores Cardoso [aut], Marcos Jun Iti Yokoo [aut] |
| Maintainer: | VinÃcius Junqueira <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-06-26 18:38:08 UTC |
| Source: | https://github.com/cran/SNPkit |
This function converts a genotype matrix coded as 0/1/2/NA or AA/AB/BB to a
snpStats::SnpMatrix object. It includes checks for coding validity,
missing values, and duplicate sample or SNP IDs, and preserves row and column
names from the input.
as_snpmatrix( geno, coding = c("012", "AAABBB"), missing_codes = c("NA", "-9", ".", ""), check_ids = TRUE )as_snpmatrix( geno, coding = c("012", "AAABBB"), missing_codes = c("NA", "-9", ".", ""), check_ids = TRUE )
geno |
A samples x SNPs matrix or data.frame with genotypes coded as
0, 1, 2, or NA. Can be numeric/integer or character. |
coding |
One of |
missing_codes |
Character values to treat as missing (only used when
|
check_ids |
If |
The function accepts both matrix and data.frame inputs. For
data.frame objects, all columns are coerced to a common type using
as.matrix(), which preserves rownames and colnames.
The returned SnpMatrix object stores each genotype as a single byte,
which is memory-efficient compared to integer storage. However, large datasets
still require substantial RAM. For very large genotype sets, consider using
on-disk formats such as SNPRelate (GDS) or bigsnpr.
A snpStats::SnpMatrix with the same dimnames as geno.
# Numeric 0/1/2 with NAs set.seed(1) geno <- matrix(sample(c(0L,1L,2L,NA), 20, replace=TRUE), nrow=5) rownames(geno) <- paste0("ind", 1:5) colnames(geno) <- paste0("snp", 1:4) SM <- as_snpmatrix(geno) # Character AA/AB/BB geno_c <- matrix(sample(c("AA","AB","BB","."), 20, replace=TRUE, prob=c(.35,.3,.3,.05)), nrow=5) rownames(geno_c) <- rownames(geno) colnames(geno_c) <- colnames(geno) SMc <- as_snpmatrix(geno_c, coding="AAABBB", missing_codes=".")# Numeric 0/1/2 with NAs set.seed(1) geno <- matrix(sample(c(0L,1L,2L,NA), 20, replace=TRUE), nrow=5) rownames(geno) <- paste0("ind", 1:5) colnames(geno) <- paste0("snp", 1:4) SM <- as_snpmatrix(geno) # Character AA/AB/BB geno_c <- matrix(sample(c("AA","AB","BB","."), 20, replace=TRUE, prob=c(.35,.3,.3,.05)), nrow=5) rownames(geno_c) <- rownames(geno) colnames(geno_c) <- colnames(geno) SMc <- as_snpmatrix(geno_c, coding="AAABBB", missing_codes=".")
This function performs a column-wise binding of multiple SnpMatrix objects,
explicitly preserving row names and column names, avoiding unexpected "object has no names" warnings.
cbind_SnpMatrix(...)cbind_SnpMatrix(...)
... |
SnpMatrix objects to combine (must have identical row names). |
A single combined SnpMatrix with preserved row and column names.
m1 <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = 3, ncol = 2, dimnames = list(c("S1", "S2", "S3"), c("SNP1", "SNP2")))) m2 <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = 3, ncol = 2, dimnames = list(c("S1", "S2", "S3"), c("SNP3", "SNP4")))) cbind_SnpMatrix(m1, m2)m1 <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = 3, ncol = 2, dimnames = list(c("S1", "S2", "S3"), c("SNP1", "SNP2")))) m2 <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = 3, ncol = 2, dimnames = list(c("S1", "S2", "S3"), c("SNP3", "SNP4")))) cbind_SnpMatrix(m1, m2)
Identifies SNPs with call rates below a minimum threshold.
check.call.rate(summary, min.call.rate)check.call.rate(summary, min.call.rate)
summary |
A data frame with SNP summary statistics (must contain 'Call.rate' column). |
min.call.rate |
Numeric value specifying the minimum acceptable call rate. |
Character vector with SNP names below threshold. Returns 'NULL' if none.
df <- data.frame(Call.rate = c(0.85, 0.95), row.names = c("SNP1", "SNP2")) check.call.rate(df, 0.9)df <- data.frame(Call.rate = c(0.85, 0.95), row.names = c("SNP1", "SNP2")) check.call.rate(df, 0.9)
Checks IBS status for two genotypes.
check.ibs(gen)check.ibs(gen)
gen |
Numeric vector of length two with genotype codes. |
Integer: 2 if identical non-heterozygotes, 0 if opposite homozygotes, -1 otherwise.
check.ibs(c(1, 1)) check.ibs(c(1, 3))check.ibs(c(1, 1)) check.ibs(c(1, 3))
Identifies sample pairs considered identical based on genotype distances.
check.identical.samples(genotypes, threshold = 0)check.identical.samples(genotypes, threshold = 0)
genotypes |
Genotype matrix (samples x SNPs) or SnpMatrix. |
threshold |
Numeric distance threshold. Default 0. |
Data frame of identical sample pairs.
mat <- matrix(sample(0:2, 20, TRUE), nrow = 5) rownames(mat) <- paste0("S", 1:5) check.identical.samples(mat, 0.5)mat <- matrix(sample(0:2, 20, TRUE), nrow = 5) rownames(mat) <- paste0("S", 1:5) check.identical.samples(mat, 0.5)
Identifies identical samples within SNP blocks.
check.identical.samples.by.block(genotypes, blcsize, threshold = 0)check.identical.samples.by.block(genotypes, blcsize, threshold = 0)
genotypes |
Genotype matrix. |
blcsize |
Block size (number of SNPs). |
threshold |
Distance threshold. Default 0. |
List of identical sample pairs.
set.seed(1) mat <- matrix(sample(1:3, 40, TRUE), nrow = 4) rownames(mat) <- paste0("S", 1:4) check.identical.samples.by.block(mat, blcsize = 5, threshold = 0)set.seed(1) mat <- matrix(sample(1:3, 40, TRUE), nrow = 4) rownames(mat) <- paste0("S", 1:4) check.identical.samples.by.block(mat, blcsize = 5, threshold = 0)
Identifies Mendelian inconsistencies between father-child pairs.
check.mendelian.inconsistencies(genotypes, father, child)check.mendelian.inconsistencies(genotypes, father, child)
genotypes |
Genotype matrix. |
father |
Vector of father sample IDs. |
child |
Vector of child sample IDs. |
Data frame summarizing inconsistencies per pair.
set.seed(1) genotypes <- matrix(sample(1:3, 30, TRUE), nrow = 3, dimnames = list(c("F1", "C1", "C2"), NULL)) check.mendelian.inconsistencies(genotypes, father = "F1", child = c("C1", "C2"))set.seed(1) genotypes <- matrix(sample(1:3, 30, TRUE), nrow = 3, dimnames = list(c("F1", "C1", "C2"), NULL)) check.mendelian.inconsistencies(genotypes, father = "F1", child = c("C1", "C2"))
Calculates number of inconsistencies and total comparable SNPs for a parent-child pair.
check.mendelian.inconsistencies.pair(g1, g2)check.mendelian.inconsistencies.pair(g1, g2)
g1 |
Genotype vector for parent. |
g2 |
Genotype vector for child. |
Numeric vector: [# inconsistencies, # comparable SNPs].
g1 <- c(1, 1, 3, 3, 2) g2 <- c(3, 1, 1, 3, 2) check.mendelian.inconsistencies.pair(g1, g2)g1 <- c(1, 1, 3, 3, 2) g2 <- c(3, 1, 1, 3, 2) check.mendelian.inconsistencies.pair(g1, g2)
Identifies samples with call rate below a given threshold.
check.sample.call.rate(sample.summary, min.call.rate)check.sample.call.rate(sample.summary, min.call.rate)
sample.summary |
A data frame with a "Call.rate" column for each sample. |
min.call.rate |
Minimum acceptable call rate (between 0 and 1). |
A character vector with the names of samples to remove.
Identifies samples with heterozygosity values deviating beyond a specified threshold.
check.sample.heterozygosity(sample.summary, max.dev)check.sample.heterozygosity(sample.summary, max.dev)
sample.summary |
Data frame containing sample summary (must have 'Heterozygosity' column). |
max.dev |
Maximum number of standard deviations allowed from mean. |
Character vector with sample names considered outliers. Returns 'NULL' if none.
ss <- data.frame(Heterozygosity = c(0.2, 0.5, 0.7)) rownames(ss) <- c("Ind1", "Ind2", "Ind3") check.sample.heterozygosity(ss, 1)ss <- data.frame(Heterozygosity = c(0.2, 0.5, 0.7)) rownames(ss) <- c("Ind1", "Ind2", "Ind3") check.sample.heterozygosity(ss, 1)
Filters SNP names belonging to specified chromosomes.
check.snp.chromo(snpmap, chromosomes)check.snp.chromo(snpmap, chromosomes)
snpmap |
Data frame with SNP map info (must contain columns 'Chromosome' and 'Name'). |
chromosomes |
Vector of chromosome identifiers to filter. |
Character vector with SNP names.
snpmap <- data.frame(Chromosome = c(1, 1, 2), Name = c("SNP1", "SNP2", "SNP3")) check.snp.chromo(snpmap, 1)snpmap <- data.frame(Chromosome = c(1, 1, 2), Name = c("SNP1", "SNP2", "SNP3")) check.snp.chromo(snpmap, 1)
Identifies SNPs deviating from HWE beyond a z-score threshold.
check.snp.hwe(snp.summary, max.dev)check.snp.hwe(snp.summary, max.dev)
snp.summary |
Data frame with SNP summary (must contain 'z.HWE' column). |
max.dev |
Maximum z-score allowed. |
Character vector with SNP names deviating from HWE. Returns 'NULL' if none.
df <- data.frame(z.HWE = c(2, 5), row.names = c("SNP1", "SNP2")) check.snp.hwe(df, 3)df <- data.frame(z.HWE = c(2, 5), row.names = c("SNP1", "SNP2")) check.snp.hwe(df, 3)
This function identifies SNP markers whose Hardy-Weinberg equilibrium (HWE) chi-square p-values
indicate significant deviation beyond a specified threshold. It uses the p-values computed by
get.hwe.chi2 on the input summary data frame.
check.snp.hwe.chi2(snp.summary, max.dev)check.snp.hwe.chi2(snp.summary, max.dev)
snp.summary |
A data frame or matrix containing summary statistics for SNP markers.
The row names should correspond to SNP identifiers. It must be compatible with
the function |
max.dev |
A numeric value specifying the maximum acceptable p-value threshold. SNPs with p-values below this threshold are considered as deviating from HWE. |
Any SNP with missing p-value (NA) is treated as not failing (returned as FALSE).
A character vector of SNP identifiers (rownames) that fail the HWE test (p-value < max.dev).
If no SNPs fail, an empty vector is returned.
snp.summary <- data.frame( Calls = c(100, 100), P.AA = c(0.25, 0.7), P.AB = c(0.50, 0.05), P.BB = c(0.25, 0.25), row.names = c("SNP1", "SNP2") ) check.snp.hwe.chi2(snp.summary, max.dev = 0.05)snp.summary <- data.frame( Calls = c(100, 100), P.AA = c(0.25, 0.7), P.AB = c(0.50, 0.05), P.BB = c(0.25, 0.25), row.names = c("SNP1", "SNP2") ) check.snp.hwe.chi2(snp.summary, max.dev = 0.05)
Identifies SNPs with minor allele frequency below a minimum threshold.
check.snp.maf(snp.summary, min.maf)check.snp.maf(snp.summary, min.maf)
snp.summary |
Data frame with SNP summary (must contain 'MAF' column). |
min.maf |
Minimum MAF allowed. |
Character vector with SNP names below threshold. Returns 'NULL' if none.
df <- data.frame(MAF = c(0.01, 0.2), row.names = c("SNP1", "SNP2")) check.snp.maf(df, 0.05)df <- data.frame(MAF = c(0.01, 0.2), row.names = c("SNP1", "SNP2")) check.snp.maf(df, 0.05)
Identifies SNPs with genotype frequencies below a minimum threshold.
check.snp.mgf(snp.summary, min.mgf)check.snp.mgf(snp.summary, min.mgf)
snp.summary |
Data frame with columns 'P.AA', 'P.AB', 'P.BB'. |
min.mgf |
Minimum genotype frequency allowed. |
Character vector with SNP names below threshold. Returns 'NULL' if none.
df <- data.frame(P.AA = c(0.01, 0.5), P.AB = c(0.02, 0.4), P.BB = c(0.01, 0.1)) rownames(df) <- c("SNP1", "SNP2") check.snp.mgf(df, 0.05)df <- data.frame(P.AA = c(0.01, 0.5), P.AB = c(0.02, 0.4), P.BB = c(0.01, 0.1)) rownames(df) <- c("SNP1", "SNP2") check.snp.mgf(df, 0.05)
Identifies SNPs considered monomorphic.
check.snp.monomorf(snp.summary)check.snp.monomorf(snp.summary)
snp.summary |
Data frame with columns 'P.AA', 'P.AB', 'P.BB'. |
Character vector with monomorphic SNP names. Returns 'NULL' if none.
df <- data.frame(P.AA = c(1, 0.5), P.AB = c(0, 0.5), P.BB = c(0, 0)) rownames(df) <- c("SNP1", "SNP2") check.snp.monomorf(df)df <- data.frame(P.AA = c(1, 0.5), P.AB = c(0, 0.5), P.BB = c(0, 0)) rownames(df) <- c("SNP1", "SNP2") check.snp.monomorf(df)
Identifies SNPs with position equal to zero in the SNP map.
check.snp.no.position(snpmap)check.snp.no.position(snpmap)
snpmap |
Data frame with columns 'Position' and 'Name'. |
Character vector with SNP names without position. Returns 'NULL' if none.
df <- data.frame(Position = c(0, 100), Name = c("SNP1", "SNP2")) check.snp.no.position(df)df <- data.frame(Position = c(0, 100), Name = c("SNP1", "SNP2")) check.snp.no.position(df)
Identifies groups of SNPs that are mapped to the exact same genomic position on each chromosome. Returns a list where each element corresponds to one group of overlapping SNPs.
Identifies SNPs that share the same position on the same chromosome.
check.snp.same.position(snpmap) check.snp.same.position(snpmap)check.snp.same.position(snpmap) check.snp.same.position(snpmap)
snpmap |
Data frame with columns 'Chromosome', 'Position', and 'Name'. |
A list of character vectors, each with names of SNPs found at the same position.
List of SNP groups sharing positions.
df <- data.frame(Chromosome = c(1, 1, 2), Position = c(100, 100, 200), Name = c("SNP1", "SNP2", "SNP3")) check.snp.same.position(df)df <- data.frame(Chromosome = c(1, 1, 2), Position = c(100, 100, 200), Name = c("SNP1", "SNP2", "SNP3")) check.snp.same.position(df)
This function merges a list of SNPDataLong objects, typically representing different SNP panels
or datasets, into a single unified SNPDataLong object. It ensures that all genotype matrices
have the same set of SNPs (filling missing SNPs with NA), and merges the marker map information while
removing duplicate SNP entries.
combineSNPData(lista)combineSNPData(lista)
lista |
A list of |
A single SNPDataLong object containing the combined genotype matrix, merged map,
and a concatenated path string.
make_obj <- function(samples, snps) { m <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = length(samples), ncol = length(snps), dimnames = list(samples, snps))) methods::new("SNPDataLong", geno = m, map = data.frame(Name = snps, Chromosome = 1, Position = seq_along(snps)), path = tempfile(), xref_path = "chip1") } obj1 <- make_obj(c("S1", "S2"), c("SNP1", "SNP2")) obj2 <- make_obj(c("S3", "S4"), c("SNP2", "SNP3")) combined <- combineSNPData(list(obj1, obj2))make_obj <- function(samples, snps) { m <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = length(samples), ncol = length(snps), dimnames = list(samples, snps))) methods::new("SNPDataLong", geno = m, map = data.frame(Name = snps, Chromosome = 1, Position = seq_along(snps)), path = tempfile(), xref_path = "chip1") } obj1 <- make_obj(c("S1", "S2"), c("SNP1", "SNP2")) obj2 <- make_obj(c("S3", "S4"), c("SNP2", "SNP3")) combined <- combineSNPData(list(obj1, obj2))
Performs PCA using the genome relationship matrix (GRM).
doPCA(genotypes)doPCA(genotypes)
genotypes |
Genotype matrix. |
List containing 'pcs' (principal components) and 'eigen' (eigenvalues).
set.seed(1) mat <- matrix(sample(as.raw(1:3), 200, TRUE), nrow = 10, ncol = 20) geno <- methods::new("SnpMatrix", mat) rownames(geno) <- paste0("S", 1:10) colnames(geno) <- paste0("SNP", 1:20) res <- doPCA(geno) str(res)set.seed(1) mat <- matrix(sample(as.raw(1:3), 200, TRUE), nrow = 10, ncol = 20) geno <- methods::new("SnpMatrix", mat) rownames(geno) <- paste0("S", 1:10) colnames(geno) <- paste0("SNP", 1:20) res <- doPCA(geno) str(res)
Generates exploratory plots: MAF histograms, HWE plots, heterozygosity scatter, MDS, and dendrogram.
exploratory.plots( snp.summary, snps.plot, sample.summary, samples.plot, distm, glabels, mds.plot, hierq.plot )exploratory.plots( snp.summary, snps.plot, sample.summary, samples.plot, distm, glabels, mds.plot, hierq.plot )
snp.summary |
Data frame with SNP summary. |
snps.plot |
Filename for SNP histogram plot. |
sample.summary |
Data frame with sample summary. |
samples.plot |
Filename for heterozygosity plot. |
distm |
Distance matrix for samples. |
glabels |
Sample labels for plots. |
mds.plot |
Filename for MDS plot. |
hierq.plot |
Filename for hierarchical cluster plot. |
NULL (plots are saved as JPEG files)
tmp <- tempfile(fileext = ".jpg") snp.summary <- data.frame( MAF = runif(20), z.HWE = rnorm(20), Calls = rep(100, 20), P.AA = runif(20, 0, 0.5), P.AB = runif(20, 0, 0.5), P.BB = runif(20, 0, 0.5) ) sample.summary <- data.frame( Call.rate = runif(5, 0.9, 1), Heterozygosity = runif(5, 0.2, 0.4), row.names = paste0("S", 1:5) ) distm <- stats::dist(matrix(rnorm(25), nrow = 5)) exploratory.plots(snp.summary, snps.plot = tempfile(fileext = ".jpg"), sample.summary = sample.summary, samples.plot = tempfile(fileext = ".jpg"), distm = distm, glabels = paste0("S", 1:5), mds.plot = tempfile(fileext = ".jpg"), hierq.plot = tempfile(fileext = ".jpg"))tmp <- tempfile(fileext = ".jpg") snp.summary <- data.frame( MAF = runif(20), z.HWE = rnorm(20), Calls = rep(100, 20), P.AA = runif(20, 0, 0.5), P.AB = runif(20, 0, 0.5), P.BB = runif(20, 0, 0.5) ) sample.summary <- data.frame( Call.rate = runif(5, 0.9, 1), Heterozygosity = runif(5, 0.2, 0.4), row.names = paste0("S", 1:5) ) distm <- stats::dist(matrix(rnorm(25), nrow = 5)) exploratory.plots(snp.summary, snps.plot = tempfile(fileext = ".jpg"), sample.summary = sample.summary, samples.plot = tempfile(fileext = ".jpg"), distm = distm, glabels = paste0("S", 1:5), mds.plot = tempfile(fileext = ".jpg"), hierq.plot = tempfile(fileext = ".jpg"))
A class to handle export preparation for FImpute.
genoA SnpMatrix or NULL containing genotype data.
mapA data.frame containing marker information.
pathOutput file path.
nameProject or file name.
A convenience function to construct a 'FImputeRunner' object from a 'SNPDataLong' object.
FImputeRunner(object, path, exec_path = "FImpute3", name = "data")FImputeRunner(object, path, exec_path = "FImpute3", name = "data")
object |
An object of class 'SNPDataLong', from which 'geno' and 'map' slots will be extracted. |
path |
A character string indicating the directory to save FImpute files. |
exec_path |
Path to the FImpute executable (default = "FImpute3"). |
name |
Name for the dataset (used internally, default = "data"). |
An object of class 'FImputeRunner'.
A class to manage FImpute execution and results.
exportAn FImputeExport object.
par_filePath to parameter file.
exec_pathPath to FImpute executable.
resultsA data.frame containing results or summary information.
Converts the genotype matrix (geno slot) of a SNPDataLong object to a data.frame, with optional centering and scaling per SNP (column).
genoToDF(object, center = FALSE, scale = FALSE)genoToDF(object, center = FALSE, scale = FALSE)
object |
An object of class SNPDataLong. |
center |
Logical or numeric. If TRUE (default FALSE), center columns to mean zero. |
scale |
Logical or numeric. If TRUE (default FALSE), scale columns to standard deviation one. |
A data.frame with individuals as rows and SNPs as columns (numeric 0/1/2, or centered/scaled values).
set.seed(1) raw_mat <- matrix(as.raw(sample(1:3, 100, TRUE)), nrow = 10, ncol = 10) rownames(raw_mat) <- paste0("S", 1:10) colnames(raw_mat) <- paste0("SNP", 1:10) geno <- methods::new("SnpMatrix", raw_mat) obj <- methods::new("SNPDataLong", geno = geno, map = data.frame(Name = colnames(geno), Chromosome = 1, Position = 1:10), path = tempfile(), xref_path = "chip1") df <- genoToDF(obj, center = TRUE, scale = TRUE) head(df[, 1:5])set.seed(1) raw_mat <- matrix(as.raw(sample(1:3, 100, TRUE)), nrow = 10, ncol = 10) rownames(raw_mat) <- paste0("S", 1:10) colnames(raw_mat) <- paste0("SNP", 1:10) geno <- methods::new("SnpMatrix", raw_mat) obj <- methods::new("SNPDataLong", geno = geno, map = data.frame(Name = colnames(geno), Chromosome = 1, Position = 1:10), path = tempfile(), xref_path = "chip1") df <- genoToDF(obj, center = TRUE, scale = TRUE) head(df[, 1:5])
Calculates genotype correlation using a fast check (fc) method.
get.correl.fc(g1, g2)get.correl.fc(g1, g2)
g1 |
Genotype vector. |
g2 |
Genotype vector. |
Numeric value of correlation.
g1 <- sample(0:2, 10, TRUE) g2 <- sample(0:2, 10, TRUE) get.correl.fc(g1, g2)g1 <- sample(0:2, 10, TRUE) g2 <- sample(0:2, 10, TRUE) get.correl.fc(g1, g2)
Infers gender using heterozygosity thresholds.
get.gender(sample.summary, threshM, threshF)get.gender(sample.summary, threshM, threshF)
sample.summary |
Data frame with 'Heterozygosity' column. |
threshM |
Numeric threshold for males. |
threshF |
Numeric threshold for females. |
Data frame with columns 'heterozygosity' and 'sex'.
df <- data.frame(Heterozygosity = c(0.1, 0.3, 0.6)) rownames(df) <- c("A", "B", "C") get.gender(df, 0.2, 0.5)df <- data.frame(Heterozygosity = c(0.1, 0.3, 0.6)) rownames(df) <- c("A", "B", "C") get.gender(df, 0.2, 0.5)
Calculates Hardy-Weinberg equilibrium chi-square p-values for SNPs.
get.hwe.chi2(snp.summary)get.hwe.chi2(snp.summary)
snp.summary |
Data frame with columns 'Calls', 'P.AA', 'P.AB', 'P.BB'. |
Numeric vector with p-values.
df <- data.frame(Calls = c(100, 100), P.AA = c(0.6, 0.4), P.AB = c(0.3, 0.4), P.BB = c(0.1, 0.2)) get.hwe.chi2(df)df <- data.frame(Calls = c(100, 100), P.AA = c(0.6, 0.4), P.AB = c(0.3, 0.4), P.BB = c(0.1, 0.2)) get.hwe.chi2(df)
Allows flexible import of SNP genotype data from Illumina FinalReport files,
using fast initial column detection via data.table::fread, followed by
full genotype matrix construction with snpStats::read.snps.long.
getGeno(...) ## S4 method for signature 'ANY' getGeno( path, fields = list(sample = 2, snp = 1, allele1 = 7, allele2 = 8, confidence = 9), codes = c("A", "B"), threshold = 0.15, sep = "\t", skip = 0, verbose = TRUE, every = NULL )getGeno(...) ## S4 method for signature 'ANY' getGeno( path, fields = list(sample = 2, snp = 1, allele1 = 7, allele2 = 8, confidence = 9), codes = c("A", "B"), threshold = 0.15, sep = "\t", skip = 0, verbose = TRUE, every = NULL )
... |
Additional optional arguments. |
path |
Path to the directory containing |
fields |
List specifying column indices (sample, snp, allele1, allele2, confidence) |
codes |
Allele codes (e.g., |
threshold |
Confidence threshold |
sep |
Field separator |
skip |
Lines to skip |
verbose |
Logical; show progress |
every |
Frequency for progress updates |
An SNPDataLong object
Calculates IBS mean and standard deviation between two samples.
ibs.pair(g1, g2)ibs.pair(g1, g2)
g1 |
Genotype vector for first sample. |
g2 |
Genotype vector for second sample. |
Numeric vector: [mean IBS, standard deviation].
g1 <- sample(0:2, 10, TRUE) g2 <- sample(0:2, 10, TRUE) ibs.pair(g1, g2)g1 <- sample(0:2, 10, TRUE) g2 <- sample(0:2, 10, TRUE) ibs.pair(g1, g2)
Reads and imports multiple genotype datasets specified in a list of configurations. Each configuration must include the path to the genotype data and information on field mapping. Optionally, you can also specify codes, quality threshold, separator, lines to skip, and a subset of IDs to retain. The function automatically fills the 'xref_path' slot per individual and combines maps into a single data.frame, adding a 'SourcePath' column indicating their origin and removing duplicated SNP rows (by Name). Prints progress messages indicating the current path being loaded (with counter).
import_geno_list(config_list)import_geno_list(config_list)
config_list |
A list of configuration lists. Each element should contain: - 'path' (character): Path to the genotype file or folder. - 'fields' (list): Named list defining the columns (e.g., SNP ID, sample ID, alleles, confidence). - 'codes' (character vector, optional): Allele codes (default is c("A", "B")). - 'threshold' (numeric, optional): Maximum allowed missingness or confidence threshold (default 0.15). - 'sep' (character, optional): Field separator in the input file (default "tab-delimited"). - 'skip' (integer, optional): Number of lines to skip at the beginning of the file (default 0). - 'verbose' (logical, optional): Whether to print detailed messages (default TRUE). - 'subset' (character vector, optional): Vector of sample IDs to retain after import. |
An object of class 'SNPDataLong' containing: - Combined genotype matrix ('geno'). - Combined map ('map') as a single data.frame with 'SourcePath' column and without duplicated rows. - Combined 'xref_path' vector (one entry per individual). - 'path' slot as a semicolon-separated string of all input dataset paths.
Imports genotype data from multiple configurations defined in an
SNPImportList object and combines them into a unified SNPDataLong object.
importAllGenos(object) ## S4 method for signature 'SNPImportList' importAllGenos(object)importAllGenos(object) ## S4 method for signature 'SNPImportList' importAllGenos(object)
object |
An |
A combined SNPDataLong object.
Reads existing imputed results from a given path and returns an object of class SNPDataLong.
importFImputeResults(path, method = "R")importFImputeResults(path, method = "R")
path |
Character. Path to the folder containing 'output_fimpute' (e.g., "fimpute_run_nelore"). |
method |
Character. "R" (default) or "Rcpp". Passed to read.fimpute(). |
An object of class SNPDataLong containing the imputed genotypes and SNP map.
Groups sample pairs into sets of related samples.
pairs2sets(pairs)pairs2sets(pairs)
pairs |
Matrix or list of sample pairs. |
List of sets of samples.
pairs <- matrix(c("A", "B", "B", "C", "D", "E"), ncol = 2, byrow = TRUE) pairs2sets(pairs)pairs <- matrix(c("A", "B", "B", "C", "D", "E"), ncol = 2, byrow = TRUE) pairs2sets(pairs)
Plot PCA groups from anticlustering result
plotPCAgroups(pca_res, groups, pcs = c(1, 2), filename = NULL)plotPCAgroups(pca_res, groups, pcs = c(1, 2), filename = NULL)
pca_res |
A prcomp object. |
groups |
A factor or vector of group assignments. |
pcs |
Vector of length 2 indicating which PCs to plot (default: c(1, 2)). |
filename |
Optional. If provided, saves plot to this file (e.g., "antic.png"). |
A ggplot object (also prints to screen).
set.seed(1) pca_res <- stats::prcomp(matrix(rnorm(200), nrow = 20)) groups <- sample(1:2, 20, replace = TRUE) plotPCAgroups(pca_res, groups)set.seed(1) pca_res <- stats::prcomp(matrix(rnorm(200), nrow = 20)) groups <- sample(1:2, 20, replace = TRUE) plotPCAgroups(pca_res, groups)
Displays the contents of a summary.SNPDataLong object on the console.
## S3 method for class 'summary.SNPDataLong' print(x, ...)## S3 method for class 'summary.SNPDataLong' print(x, ...)
x |
An object of class |
... |
Further arguments (currently unused). |
The input x, returned invisibly.
Prints a formatted message with a border for section titles in the console.
qc_header(title)qc_header(title)
title |
Character string to be printed inside the header box. |
No return value. Used for side effects (message).
qc_header("Quality Control on Samples")qc_header("Quality Control on Samples")
Applies quality control (QC) procedures to samples in a 'SNPDataLong' object, based on heterozygosity and call rate thresholds.
qcSamples(x, ...) ## S4 method for signature 'SNPDataLong' qcSamples( x, heterozygosity = NULL, smp_cr = NULL, action = c("report", "filter", "both") )qcSamples(x, ...) ## S4 method for signature 'SNPDataLong' qcSamples( x, heterozygosity = NULL, smp_cr = NULL, action = c("report", "filter", "both") )
x |
An object of class 'SNPDataLong'. |
... |
Additional optional arguments. |
heterozygosity |
A numeric threshold or range for heterozygosity. Samples outside this threshold are removed. |
smp_cr |
Minimum acceptable sample call rate (between 0 and 1). Samples below this value are removed. |
action |
Character string indicating the action to perform. One of: - '"report"': only returns a list of samples to remove and those kept; - '"filter"': returns a filtered object without reporting; - '"both"': performs filtering and returns the filtered object. |
Depending on the 'action' argument: - '"report"': returns a list with removed and kept samples; - '"filter"': returns a new 'SNPDataLong' object with filtered genotypes; - '"both"': returns a list with: - 'filtered': the filtered 'SNPDataLong' object; - 'report': a list of removed and kept samples.
Applies flexible quality control filters on an object of class SNPDataLong.
Supports call rate filtering, minor allele frequency (MAF), Hardy-Weinberg equilibrium (HWE),
removal of monomorphic SNPs, exclusion of specific chromosomes, optionally removing SNPs without positions,
and optionally removing SNPs at the same genomic position (keeping the one with highest MAF).
qcSNPs(x, ...) ## S4 method for signature 'SNPDataLong' qcSNPs( x, missing_ind = NULL, missing_snp = NULL, min_snp_cr = NULL, min_maf = NULL, hwe = NULL, snp_position = NULL, no_position = NULL, snp_mono = FALSE, remove_chr = NULL, action = c("report", "filter", "both") )qcSNPs(x, ...) ## S4 method for signature 'SNPDataLong' qcSNPs( x, missing_ind = NULL, missing_snp = NULL, min_snp_cr = NULL, min_maf = NULL, hwe = NULL, snp_position = NULL, no_position = NULL, snp_mono = FALSE, remove_chr = NULL, action = c("report", "filter", "both") )
x |
An object of class SNPDataLong. |
... |
Additional optional arguments. |
missing_ind |
Maximum allowed proportion of missing data per individual (currently not implemented). |
missing_snp |
Maximum allowed proportion of missing data per SNP (currently not implemented). |
min_snp_cr |
Minimum acceptable call rate for SNPs (e.g., 0.95). SNPs below this threshold are removed. |
min_maf |
Minimum minor allele frequency allowed for SNPs (e.g., 0.05). SNPs with lower MAF are removed. |
hwe |
p-value threshold for Hardy-Weinberg equilibrium test (e.g., 1e-6). SNPs violating this are removed. |
snp_position |
Logical. If TRUE, removes SNPs mapped to the same position, retaining only the one with highest MAF. |
no_position |
Logical. If TRUE, removes SNPs without defined genomic positions. |
snp_mono |
Logical. If TRUE, removes monomorphic SNPs (with no variation). |
remove_chr |
Character vector of chromosomes to exclude (e.g., c("X", "Y")). |
action |
One of "report" (returns a list of removed SNPs), "filter" (returns filtered SNPDataLong), or "both" (returns both). |
Depending on the action argument: - "report": list of SNPs removed by each filter and SNPs retained. - "filter": filtered SNPDataLong object. - "both": list containing the filtered object and detailed report.
set.seed(123) raw_mat <- matrix(as.raw(sample(1:3, 100, TRUE)), nrow = 10, ncol = 10) colnames(raw_mat) <- paste0("snp", 1:10) rownames(raw_mat) <- paste0("ind", 1:10) geno <- methods::new("SnpMatrix", raw_mat) map <- data.frame(Name = colnames(geno), Chromosome = 1, Position = 1:10) x <- methods::new("SNPDataLong", geno = geno, map = map, path = tempfile(), xref_path = "chip1") qcSNPs(x, min_snp_cr = 0.8, min_maf = 0.05, snp_mono = TRUE, no_position = TRUE, snp_position = TRUE, action = "filter")set.seed(123) raw_mat <- matrix(as.raw(sample(1:3, 100, TRUE)), nrow = 10, ncol = 10) colnames(raw_mat) <- paste0("snp", 1:10) rownames(raw_mat) <- paste0("ind", 1:10) geno <- methods::new("SnpMatrix", raw_mat) map <- data.frame(Name = colnames(geno), Chromosome = 1, Position = 1:10) x <- methods::new("SNPDataLong", geno = geno, map = map, path = tempfile(), xref_path = "chip1") qcSNPs(x, min_snp_cr = 0.8, min_maf = 0.05, snp_mono = TRUE, no_position = TRUE, snp_position = TRUE, action = "filter")
This function performs a row-wise binding of multiple SnpMatrix objects,
explicitly preserving row names and column names, avoiding unexpected "object has no names" warnings.
rbind_SnpMatrix(...)rbind_SnpMatrix(...)
... |
SnpMatrix objects to combine (must have identical column names). |
A single combined SnpMatrix with preserved row and column names.
m1 <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = 2, ncol = 3, dimnames = list(c("S1", "S2"), c("SNP1", "SNP2", "SNP3")))) m2 <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = 2, ncol = 3, dimnames = list(c("S3", "S4"), c("SNP1", "SNP2", "SNP3")))) rbind_SnpMatrix(m1, m2)m1 <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = 2, ncol = 3, dimnames = list(c("S1", "S2"), c("SNP1", "SNP2", "SNP3")))) m2 <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = 2, ncol = 3, dimnames = list(c("S3", "S4"), c("SNP1", "SNP2", "SNP3")))) rbind_SnpMatrix(m1, m2)
Combines multiple SnpMatrix objects by rows, automatically handling differing SNP columns, optimized for large matrices.
rbindSnpFlexible(...)rbindSnpFlexible(...)
... |
One or more SnpMatrix objects. |
A single SnpMatrix object with all rows combined.
m1 <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = 2, ncol = 3, dimnames = list(c("S1", "S2"), c("SNP1", "SNP2", "SNP3")))) m2 <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = 2, ncol = 2, dimnames = list(c("S3", "S4"), c("SNP2", "SNP4")))) rbindSnpFlexible(m1, m2)m1 <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = 2, ncol = 3, dimnames = list(c("S1", "S2"), c("SNP1", "SNP2", "SNP3")))) m2 <- methods::new("SnpMatrix", matrix(as.raw(1:3), nrow = 2, ncol = 2, dimnames = list(c("S3", "S4"), c("SNP2", "SNP4")))) rbindSnpFlexible(m1, m2)
Reads imputed genotypes and SNP information from FImpute output, builds a SnpMatrix and a corresponding map, and returns an SNPDataLong object.
read.fimpute(file, method = c("R", "Rcpp"))read.fimpute(file, method = c("R", "Rcpp"))
file |
Character. Path to the FImpute output directory (usually "output_fimpute"). |
method |
Character. "R" (default) for vectorized R implementation, or "Rcpp" for compiled C++ implementation. |
An object of class SNPDataLong with three slots:
geno (a SnpMatrix with individuals as rows and SNPs as
columns), map (a data.frame with columns Name,
Chromosome, and Position), and path (the input
directory).
## Not run: # Requires a directory containing FImpute output files # (genotypes_imp.txt and snp_info.txt). snp_long <- read.fimpute("output_fimpute", method = "R") ## End(Not run)## Not run: # Requires a directory containing FImpute output files # (genotypes_imp.txt and snp_info.txt). snp_long <- read.fimpute("output_fimpute", method = "R") ## End(Not run)
This function runs the ADMIXTURE program on a set of PLINK files (.bed/.bim/.fam) located in a specified directory, using a given file prefix. It supports both unsupervised and supervised analyses, optional cross-validation, and custom output file prefixes to avoid overwriting results.
run_admixture( path, prefix, admixture_path = "admixture", K, supervised = FALSE, pop_assignments = NULL, extra_args = NULL, out_prefix = NULL, cv = NULL )run_admixture( path, prefix, admixture_path = "admixture", K, supervised = FALSE, pop_assignments = NULL, extra_args = NULL, out_prefix = NULL, cv = NULL )
path |
Character. Path to the folder containing PLINK files. |
prefix |
Character. File prefix (without extension). The function will look for '<prefix>.bed', '<prefix>.bim', and '<prefix>.fam' in 'path'. |
admixture_path |
Character. Path to the ADMIXTURE executable, or "admixture" if in system PATH. Default is "admixture". |
K |
Integer. Number of ancestral populations to estimate. |
supervised |
Logical. If TRUE, runs ADMIXTURE in supervised mode (requires |
pop_assignments |
Character vector. Population assignments for each individual (length equal to number of individuals in '.fam'). Use |
extra_args |
Character vector. Additional arguments to pass to ADMIXTURE (e.g., other flags). Default is NULL. |
out_prefix |
Character. Optional prefix for renaming output files (.Q, .P, .log) after the run completes. Default is NULL. |
cv |
Integer. Number of folds for cross-validation (e.g., 5 or 10). If provided, adds |
When supervised = TRUE, a '.pop' file is automatically created in the specified directory.
Each line in this file corresponds to one individual, containing the population name or "-" for missing assignments.
If out_prefix is provided, the function renames the standard ADMIXTURE output files
(e.g., '<prefix>.3.Q') to use this prefix (e.g., 'myrun.Q').
The function only works on Linux or macOS systems.
No value returned. Runs ADMIXTURE as a side effect. Generates output files in the specified directory. Messages indicate progress and output file names.
## Not run: # Requires the external ADMIXTURE binary and PLINK files prepared beforehand. work_dir <- file.path(tempdir(), "admixture_demo") run_admixture( path = work_dir, prefix = "plink_data", admixture_path = "admixture", K = 3, out_prefix = "run1_k3" ) pop_vec <- c("A", "A", "B", "B", "-", "-", "A", "B", "A", "-") run_admixture( path = work_dir, prefix = "plink_data", admixture_path = "admixture", K = 3, supervised = TRUE, pop_assignments = pop_vec, cv = 10, out_prefix = "supervised_k3_cv10" ) ## End(Not run)## Not run: # Requires the external ADMIXTURE binary and PLINK files prepared beforehand. work_dir <- file.path(tempdir(), "admixture_demo") run_admixture( path = work_dir, prefix = "plink_data", admixture_path = "admixture", K = 3, out_prefix = "run1_k3" ) pop_vec <- c("A", "A", "B", "B", "-", "-", "A", "B", "A", "-") run_admixture( path = work_dir, prefix = "plink_data", admixture_path = "admixture", K = 3, supervised = TRUE, pop_assignments = pop_vec, cv = 10, out_prefix = "supervised_k3_cv10" ) ## End(Not run)
Converts a SNPDataLong object to a data.frame, runs PCA, and performs anticlustering on the selected principal components.
runAnticlusteringPCA(object, K = 2, n_pcs = 20, center = TRUE, scale = TRUE)runAnticlusteringPCA(object, K = 2, n_pcs = 20, center = TRUE, scale = TRUE)
object |
An object of class |
K |
Number of groups for anticlustering, or a vector of group sizes (as in anticlust). |
n_pcs |
Number of top principal components to use. If |
center |
Logical or numeric. Passed to |
scale |
Logical or numeric. Passed to |
A list with components:
Integer vector with anticlustering group assignments.
The PCA result object (from stats::prcomp).
Numeric matrix of the PCs used for anticlustering.
res <- runAnticlusteringPCA(nelore_imputed, K = 2, n_pcs = 0.8) table(res$groups)res <- runAnticlusteringPCA(nelore_imputed, K = 2, n_pcs = 0.8) table(res$groups)
This function runs the external FImpute software using a 'FImputeRunner' object, ensuring that all required input files are present and the results are imported.
runFImpute(object, verbose = TRUE) ## S4 method for signature 'FImputeRunner' runFImpute(object, verbose = TRUE)runFImpute(object, verbose = TRUE) ## S4 method for signature 'FImputeRunner' runFImpute(object, verbose = TRUE)
object |
An object of class 'FImputeRunner'. |
verbose |
Logical. If TRUE (default), FImpute output will be printed to the console. |
An updated 'FImputeRunner' object with the 'results' slot populated (SNPDataLong).
## Not run: # Requires the external FImpute3 binary in PATH. path_fimpute <- file.path(tempdir(), "fimpute_run_example") param_file <- file.path(path_fimpute, "fimpute.par") export_obj <- methods::new("FImputeExport", geno = geno_obj@geno, map = geno_obj@map, path = path_fimpute) runner <- methods::new("FImputeRunner", export = export_obj, par_file = param_file, exec_path = "FImpute3") res <- runFImpute(runner, verbose = TRUE) ## End(Not run)## Not run: # Requires the external FImpute3 binary in PATH. path_fimpute <- file.path(tempdir(), "fimpute_run_example") param_file <- file.path(path_fimpute, "fimpute.par") export_obj <- methods::new("FImputeExport", geno = geno_obj@geno, map = geno_obj@map, path = path_fimpute) runner <- methods::new("FImputeRunner", export = export_obj, par_file = param_file, exec_path = "FImpute3") res <- runFImpute(runner, verbose = TRUE) ## End(Not run)
S4 method to export genotype (.gen), map (.map), and parameter (fimpute.par) files compatible with [FImpute](https://www.aps.uoguelph.ca/~msargol/fimpute/).
saveFImpute(object, ...) ## S4 method for signature 'FImputeExport' saveFImpute(object) ## S4 method for signature 'SNPDataLong' saveFImpute(object, path)saveFImpute(object, ...) ## S4 method for signature 'FImputeExport' saveFImpute(object) ## S4 method for signature 'SNPDataLong' saveFImpute(object, path)
object |
An object of class 'FImputeExport' or 'SNPDataLong'. |
... |
Additional arguments passed to methods. |
path |
Output directory. Must be supplied by the caller (e.g. a path
inside |
No return value, called for side effects. The function writes the
files data.gen, data.map, and fimpute.par to the
directory path and returns NULL invisibly.
Convenience function to export FImpute files directly from a 'SnpMatrix' and map 'data.frame'.
saveFImputeRaw(geno, map, path, xref = NULL)saveFImputeRaw(geno, map, path, xref = NULL)
geno |
A 'SnpMatrix' object. |
map |
A data.frame with columns 'Name', 'Chromosome', 'Position', and 'SourcePath'. |
path |
Output directory. |
xref |
Optional vector of identifiers per individual (used to assign numeric chip IDs). |
No return value, called for side effects. The function writes
three files (data.gen, data.map, and fimpute.par) to
the directory specified by path and returns NULL invisibly.
Saves genotype and map data from an SNPDataLong object in PLINK format (.ped/.map and optionally binary files).
savePlink( object, path, name = "plink_data", run_plink = TRUE, chunk_size = 1000 )savePlink( object, path, name = "plink_data", run_plink = TRUE, chunk_size = 1000 )
object |
An object of class SNPDataLong. |
path |
Character. Directory where files will be saved. Must be supplied
by the caller (e.g. a folder inside |
name |
Character. Base name for PLINK output files. |
run_plink |
Logical. If TRUE (default), runs PLINK1 to convert to binary files. If FALSE, only .ped and .map files are saved. |
chunk_size |
Integer. Number of individuals per chunk for writing .ped file (default: 1000). |
No return value, called for side effects. Files (.ped/.map,
and .bed/.bim/.fam when run_plink = TRUE) are
written under path.
set.seed(1) raw_mat <- matrix(as.raw(sample(1:3, 100, TRUE)), nrow = 10, ncol = 10) rownames(raw_mat) <- paste0("S", 1:10) colnames(raw_mat) <- paste0("SNP", 1:10) geno <- methods::new("SnpMatrix", raw_mat) obj <- methods::new("SNPDataLong", geno = geno, map = data.frame(Name = colnames(geno), Chromosome = 1, Position = 1:10), path = tempfile(), xref_path = "chip1") savePlink(obj, path = tempdir(), name = "demo", run_plink = FALSE, chunk_size = 5)set.seed(1) raw_mat <- matrix(as.raw(sample(1:3, 100, TRUE)), nrow = 10, ncol = 10) rownames(raw_mat) <- paste0("S", 1:10) colnames(raw_mat) <- paste0("SNP", 1:10) geno <- methods::new("SnpMatrix", raw_mat) obj <- methods::new("SNPDataLong", geno = geno, map = data.frame(Name = colnames(geno), Chromosome = 1, Position = 1:10), path = tempfile(), xref_path = "chip1") savePlink(obj, path = tempdir(), name = "demo", run_plink = FALSE, chunk_size = 5)
A class for configuring SNP file import options.
pathPath to the SNP file.
fieldsA list specifying column mappings or field configurations.
codesCharacter vector for genotype or allele codes.
thresholdNumeric value for filtering or quality control.
sepCharacter specifying the field separator.
skipNumber of lines to skip at the top of the file.
A class for managing a list of SNP file import configurations.
configsA list of SNPFileConfig objects.
Subsets an SNPDataLong object by rows (individuals) or columns (SNPs).
You can specify which individuals or SNP markers to keep or remove.
Subset(object, index, margin = 1, keep = TRUE) ## S4 method for signature 'SNPDataLong' Subset(object, index, margin = 1, keep = TRUE)Subset(object, index, margin = 1, keep = TRUE) ## S4 method for signature 'SNPDataLong' Subset(object, index, margin = 1, keep = TRUE)
object |
A |
index |
Character vector with row (individual) or column (SNP) names to filter. |
margin |
Integer: 1 = rows (individuals), 2 = columns (SNPs). |
keep |
Logical; if |
A new SNPDataLong object, subsetted accordingly.
Provides a detailed summary of an SNPDataLong object, including sample
and SNP counts, proportion of missing data, and SNP distribution by chromosome
if mapping information is available.
## S4 method for signature 'SNPDataLong' summary(object, ...)## S4 method for signature 'SNPDataLong' summary(object, ...)
object |
An object of class |
... |
Further arguments passed to methods. |
An object of class summary.SNPDataLong, which is a list with
the following elements:
Integer. Number of individuals (rows of geno).
Integer. Number of SNPs (columns of geno).
Integer. Total number of missing genotype calls.
Numeric. Proportion of missing genotype calls.
Either a table of SNP counts per chromosome (when
the map provides Name and Chromosome) or NULL.
Either a table of SNPs with at least one
missing call per chromosome, or NULL.
The object also has a dedicated print method that displays the
summary on the console.