Title: | Iterative Pruning to Capture Population Structure |
---|---|
Description: | An unsupervised clustering algorithm based on iterative pruning is for capturing population structure. This version supports ordinal data which can be applied directly to SNP data to identify fine-level population structure and it is built on the iterative pruning Principal Component Analysis ('ipPCA') algorithm as explained in Intarapanich et al. (2009) <doi:10.1186/1471-2105-10-382>. The 'IPCAPS' involves an iterative process using multiple splits based on multivariate Gaussian mixture modeling of principal components and 'Expectation-Maximization' clustering as explained in Lebret et al. (2015) <doi:10.18637/jss.v067.i06>. In each iteration, rough clusters and outliers are also identified using the function rubikclust() from the R package 'KRIS'. |
Authors: | Kridsadakorn Chaichoompu [aut, cre], Kristel Van Steen [aut], Fentaw Abegaz [aut], Sissades Tongsima [aut], Philip Shaw [aut], Anavaj Sakuntabhai [aut], Luisa Pereira [aut] |
Maintainer: | Kridsadakorn Chaichoompu <[email protected]> |
License: | GPL-3 |
Version: | 1.1.8 |
Built: | 2024-12-25 06:44:07 UTC |
Source: | CRAN |
(Internal function) Calculae a vector of EigenFit values, internally used for parallelization
cal.eigen.fit(eigen.value)
cal.eigen.fit(eigen.value)
eigen.value |
A vector of Eigenvalues return from |
A vector of all possible values for EigenFit.
(Internal function) Check whether the IPCAPS process meets the stopping criterion.
check.stopping(eigen.value, threshold)
check.stopping(eigen.value, threshold)
eigen.value |
A vector of Eigenvalues return from |
threshold |
A threshold or a cutoff to stop the IPCAPS process. Also see
|
A list containing status
, eigen.value
, eigen.fit
,
threshold
, and no.significant.PC
as explained below:
$status
is either 0
representing that the criterion is
not met, or 1
representing that the criterion is met.
$eigen.value
is a vector of Eigenvalues as the input parameter.
$eigen.fit
is a vector of EigenFit values.
$threshold
is a threashold as the input parameter.
$no.significant.PC
is an estimated number of sinificant
principal components (PC).
(Internal function) Perform the clustering process of IPCAPS
clustering( dataframe, node = 1, result.dir, threshold, min.fst, method = "mix", min.in.group = 20, datatype = "snp", nonlinear = FALSE )
clustering( dataframe, node = 1, result.dir, threshold, min.fst, method = "mix", min.in.group = 20, datatype = "snp", nonlinear = FALSE )
dataframe |
A data frame containing |
node |
An integer representing the current node number which is being analyzed. |
result.dir |
An output directory |
threshold |
A threshold or a cutoff to stop the IPCAPS process. Also see
|
min.fst |
A number represents a cutoff for mininum Fst value. |
method |
A clustering method selected from the |
min.in.group |
A integer represents a minimum number of group members. |
datatype |
To specify whether the input data are 'snp' or other type. Defalut = 'snp'. |
nonlinear |
(Unimplemented) To specify whether linear or non-linear method is used for IPCAPS analysis. If TRUE, non-linear method is used, otherwise linear method is used. Default = FALSE. |
A list containing status
, node
, and new.index
as explained below:
$status
is either 0
representing that the criterion is
not met, or 1
representing that the criterion is met.
$node
is an integer representing the current node number which
is being analyzed.
$new.index
is a list of vectors containing a subset of indexes
split from dataframe$index
according to a clustering result.
(Internal function) Select a clustering method to be used for the IPCAPS process.
clustering.mode(node, work.dir, method)
clustering.mode(node, work.dir, method)
node |
An integer representing the current node number which is being analyzed. |
work.dir |
A working directory. |
method |
A clustering method selected from the |
A vector of cluster assignment, for which cluster each individual belongs.
(Internal function) Calculate a vector of different values from a vector of EigenFit values, internally used for parallelization
## S3 method for class 'eigen.fit' diff(eigen.value)
## S3 method for class 'eigen.fit' diff(eigen.value)
eigen.value |
A vector of Eigenvalues return from |
A vector of different values from a vector of EigenFit values
(Internal function) Check the different value of X and Y, internally used for parallelization
## S3 method for class 'xy' diff(x, y)
## S3 method for class 'xy' diff(x, y)
x |
The first number |
y |
The second number |
The different number of x
and y
.
(Internal function) Perform regression models, internally used for parallelization
do.glm(X, PC, method = "linear")
do.glm(X, PC, method = "linear")
X |
A vector of data |
PC |
A matrix of principal components |
method |
Specify a method to be used for regression model, which can be "linear", "poisson", and "negative.binomial". Default = "linear" |
A vector of residuals computed from regression model
Export clustering result of
ipcaps
to text file called 'groups.txt'.
export.groups(result.dir)
export.groups(result.dir)
result.dir |
A result directory as the |
After running, this function exports the file called 'groups.txt' to the same result directory. If 'groups.txt' already exists in the result directory, the exported file is changed to 'groups1.txt', 'groups2.txt', 'groups3.txt', ..., accordingly.
A data frame of exported data containing 4 columns; group
,
node
, label
, row.number
, as described below for more
details:
group
represents group membership of IPCAPS result.
node
represents node numbers of IPCAPS result.
label
represents labels of rows in orginal input data.
row.number
represents row numbers of orginal input data.
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to export the IPCAPS result to a text file exported.data <- export.groups(my.cluster$output.dir) print(dim(exported.data)) head(exported.data)
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to export the IPCAPS result to a text file exported.data <- export.groups(my.cluster$output.dir) print(dim(exported.data)) head(exported.data)
Obtain the information for specified
node from the output list of ipcaps
.
get.node.info(cluster.obj, node)
get.node.info(cluster.obj, node)
cluster.obj |
A list returned from the |
node |
An integer representing a node number to enquire information as shown in the HTML output files. |
The return value is NULL
if node's information does not exist or a list containing
PCs
, eigen.fit
, index
, and label
as explained
below:
$PCs
is a matrix of pricipal components of this node.
$eigen.fit
is a number represeting the EigenFit value of this
node.
$index
is a vector of row number (individuals) of raw.data
(input data).
$label
is the vector of labels of all individuals that belongs
to this node.
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to obtain the information of specified node, for example, node 3 node.info <- get.node.info(my.cluster,3) ls(node.info)
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to obtain the information of specified node, for example, node 3 node.info <- get.node.info(my.cluster,3) ls(node.info)
This version supports ordinal data which can be applied directly to SNP data to identify fine-scale population structure. It was built on the iterative pruning Principal Component Analysis (ipPCA) algorithm (Intarapanich et al., 2009; Limpiti et al., 2011). The IPCAPS involves an iterative process using multiple splits based on multivariate Gaussian mixture modeling of principal components and Clustering EM estimation (Lebret et al., 2015). In each iteration, rough clusters and outliers are also identified using our own method called rubikclust (R package KRIS).
ipcaps( bed = NA, rdata = NA, files = NA, label.file = NA, lab.col = 1, out, plot.as.pdf = FALSE, method = "mix", missing = NA, covariate = NA, cov.col.first = NA, cov.col.last = NA, threshold = 0.18, min.fst = 8e-04, min.in.group = 20, no.plot = FALSE )
ipcaps( bed = NA, rdata = NA, files = NA, label.file = NA, lab.col = 1, out, plot.as.pdf = FALSE, method = "mix", missing = NA, covariate = NA, cov.col.first = NA, cov.col.last = NA, threshold = 0.18, min.fst = 8e-04, min.in.group = 20, no.plot = FALSE )
bed |
A PLINK binary format consists of 3 files; bed, bim, and fam. To generate these files from PLINK, use option –make-bed. See more details at: http://zzz.bwh.harvard.edu/plink/data.shtml. |
rdata |
In case of re-analysis, it is convenient to run IPCAPS using the file rawdata.RData generated by IPCAPS. This file contains a matrix of SNPs (raw.data) and a vector of labels (label). |
files |
IPCAPS supports SNPs encoded as 0, 1 and 2 (dosage encoding). Rows represent SNPs and columns represent subjects. Each column needs to be separated by a space or a tab. A big text file should be divided into smaller files to load faster. For instance, to input 3 files, use as: files=c( 'input1.txt', 'input2.txt', 'input3.txt'). |
label.file |
An additional useful information (called "labels" in IPCAPS) related subject, for example, geographic location or disease phenotype. These labels (one at a time) are used in displaying the clustering outcome of IPCAPS. A label file must contain at least one column. However, it may contain more than one column in which case each column need to be separated by a space or a tab. |
lab.col |
The label in the label file to be used in the tree-like display of IPCAPS clustering results. |
out |
To set an absolute path for IPCAPS output. If the specified output directory already exists, result files are saved in sub-directories cluster_out, cluster_out1, cluster_out2, etc. |
plot.as.pdf |
To export plots as PDF. When omitted, plots are saved as PNG. |
method |
The internal clustering method. It can be set to "mix" (rubikclust & mixmod), "mixmod" (Lebret et al., 2015), "clara" (R: Clustering Large Applications), "pam" (R: Partitioning Around Medoids (PAM) Object), "meanshift" (Wang, 2016), "apcluster" (Bodenhofer et al., 2016), and "hclust" (R: Hierarchical Clustering). Default = "mix". |
missing |
Symbol used for missing genotypes. Default = NA. |
covariate |
A file of covariates; one covariate per column. SNPs can be adjusted for these covariates via regression modeling and residual computation. |
cov.col.first |
Refer to a covariate file, the first covariate to be considered as confounding variable. |
cov.col.last |
Refer to a covariate file, the last covariate to be considered as confounding variable. All the variables in between the cov.col.first and cov.col.last will be considered in the adjustment process. |
threshold |
Cutoff value for EigenFit. Possible values range from 0.03 to 0.18. The smaller, the potentially finer the substructure can be. Default = 0.18. |
min.fst |
Minimum Fst between a pair of subgroups. Default = 0.0008. |
min.in.group |
Minimum number of individuals to constitute a cluster or subgroup. Default = 20. |
no.plot |
No plot is generated if this option is TRUE. This option is useful when the system does not support X Windows in the unix based system. Default = FALSE. |
The computational time depends on the number of individuals. Consequentially, if large data sets are analyzed, it may be necessary first to split data into smaller files, and then load into computer memory (with parameter 'files'). Internally, the split files are merged to construct a com-prehensive matrix.
Returns the list object containing 2 internal objects; output.dir as class character and cluster as class data.frame. The object output.dir stores a result directory. The object cluster contains 4 columns, group, node, label, and row.number. The column group contains the assigned groups from IPCAPS. The column node contains node numbers in a tree as illustrated in the HTML result files. The column label contains the given labels that is set to the parameter label. The column row.number contains indices to an input data, which is matched to row numbers of input matrix. All clustering result files are saved in one directory (as specified by out) containing assigned groups, hierarchical trees of group membership, PC plots, and binary data for further analysis.
$snp
is a SNP matrix from BED file.
$snp.info
is a data.frame of SNP information from BIM file.
$ind.info
is a data.frame of individual information from FAM file.
If function return NULL
, it means the input files are not in proper
format.
Bodenhofer, U., Palme, J., Melkonian, C., and Kothmeier, A. (2016). apcluster : Affinity Propagation Clustering. Available at: https://CRAN.R-project.org/package=apcluster (Accessed March 7, 2017).
Intarapanich, A., Shaw, P. J., Assawamakin, A., Wangkumhang, P., Ngamphiw, C. , Chaichoompu, K., et al. (2009). Iterative pruning PCA improves resolution of highly structured populations. BMC Bioinformatics 10, 382. doi:10.1186/ 1471-2105-10-382.
Lebret, R., Iovleff, S., Langrognet, F., Biernacki, C., Celeux, G., and Govaert, G. (2015). Rmixmod: TheRPackage of the Model-Based Unsupervised, Supervised, and Semi-Supervised ClassificationMixmodLibrary. J. Stat. Softw. 67. doi:10.18637/jss.v067.i06.
Limpiti, T., Intarapanich, A., Assawamakin, A., Shaw, P. J., Wangkumhang, P., Piriyapongsa, J., et al. (2011). Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure. BMC Bioinformatics 12, 255. doi:10.1186/1471-2105-12- 255.
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., and Hornik, K. (2017). cluster: Cluster Analysis Basics and Extensions.
R: Clustering Large Applications Available at: https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/clara.html (Accessed March 7, 2017).
R Core Team (2017). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing Available at: https://www.R-project.org/.
R: Hierarchical Clustering Available at: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/hclust.html (Accessed March 7, 2017).
R: Partitioning Around Medoids (PAM) Object Available at: https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/pam.object.html (Accessed March 7, 2017).
Wang, M. C. and D. (2016). MeanShift: Clustering via the Mean Shift Algorithm. Available at: https://CRAN.R-project.org/package=MeanShift (Accessed March 7, 2017).
#Use the BED format as input #Importantly, bed file, bim file, and fam file are required #Use the example files embedded in the package BED.file <- system.file("extdata", "ipcaps_example.bed", package = "IPCAPS") LABEL.file <- system.file("extdata", "ipcaps_example_individuals.txt.gz", package = "IPCAPS") my.cluster1 <- ipcaps(bed = BED.file, label.file = LABEL.file, lab.col = 2, out = tempdir()) table(my.cluster1$cluster$label, my.cluster1$cluster$group) # Alternatively, use a text file as input # Use the example files embedded in the package #text.file <- system.file("extdata", "ipcaps_example_rowVar_colInd.txt.gz", # package="IPCAPS") #LABEL.file <- system.file("extdata", "ipcaps_example_individuals.txt.gz", # package="IPCAPS") #my.cluster2 <- ipcaps(files = c(text.file), label.file = LABEL.file, lab.col = 2, # out=tempdir()) #table(my.cluster2$cluster$label, my.cluster2$cluster$group) # The other alternative way, use an R Data file as input # Use the example file embedded in the package #rdata.file <- system.file("extdata", "ipcaps_example.rda", package = "IPCAPS") #my.cluster3 <- ipcaps(rdata = rdata.file, out = tempdir()) #table(my.cluster3$cluster$label, my.cluster3$cluster$group)
#Use the BED format as input #Importantly, bed file, bim file, and fam file are required #Use the example files embedded in the package BED.file <- system.file("extdata", "ipcaps_example.bed", package = "IPCAPS") LABEL.file <- system.file("extdata", "ipcaps_example_individuals.txt.gz", package = "IPCAPS") my.cluster1 <- ipcaps(bed = BED.file, label.file = LABEL.file, lab.col = 2, out = tempdir()) table(my.cluster1$cluster$label, my.cluster1$cluster$group) # Alternatively, use a text file as input # Use the example files embedded in the package #text.file <- system.file("extdata", "ipcaps_example_rowVar_colInd.txt.gz", # package="IPCAPS") #LABEL.file <- system.file("extdata", "ipcaps_example_individuals.txt.gz", # package="IPCAPS") #my.cluster2 <- ipcaps(files = c(text.file), label.file = LABEL.file, lab.col = 2, # out=tempdir()) #table(my.cluster2$cluster$label, my.cluster2$cluster$group) # The other alternative way, use an R Data file as input # Use the example file embedded in the package #rdata.file <- system.file("extdata", "ipcaps_example.rda", package = "IPCAPS") #my.cluster3 <- ipcaps(rdata = rdata.file, out = tempdir()) #table(my.cluster3$cluster$label, my.cluster3$cluster$group)
raw.data
A dataset contains a character vector of 1,004 elements containing labels or populations of 1,004 individuals which they belong. Five populations and outliers were labeled as "pop1", "pop2", "pop3", "pop4", "pop5", and "outlier".
data(ipcaps_example)
data(ipcaps_example)
A vector with 1,004 elements.
output.template
contains $lno_title
, $lno_data
,
$lno_leafnode
, $lno_body
, and $template
as explained
below:
A index number of $template
that a title of HTML
file is replaced.
A index number of $template
that a data section in
the HTML file is replaced.
A index number of $template
that a leaf-node
section in the HTML file is replaced.
A index number of $template
that a body section in
the HTML file is replaced.
A vector of characters for HTML file.
output.template
output.template
A list with with 5 objects
(Internal function) Manipulate categorical input files
pasre.categorical.data(files)
pasre.categorical.data(files)
files |
IPCAPS supports categorical data, which rows represent features and columns represent subjects or individuals. Each column needs to be separated by a space or a tab. A big text file should be divided into smaller files to load faster. For instance, to input 3 files, use as: files=c( 'input1.txt', 'input2.txt', 'input3.txt'). |
A data frame of input data
raw.data
A dataset contains a numeric matrix of 1,004 rows and 10 columns of top 10
PCs calculated from the dataset raw.data
. The PCs were calculated
using linear principal component analysis (PCA), see more datails at
KRIS::cal.pc.linear
data(ipcaps_example)
data(ipcaps_example)
A matrix with 10 columns and 1,004 rows
(Internal function) Perform the post-processing step of IPCAPS
postprocess(result.dir, reanalysis = FALSE)
postprocess(result.dir, reanalysis = FALSE)
result.dir |
A result directory as the |
reanalysis |
(Unimplemented) To specify whether it is re-analysis or not. If TRUE, it is re-analysis, otherwise it is not. Default = FALSE. |
A data frame of clustering result containing 4 columns;
group
, node
, label
, row.number
, as described
below for more details:
group
represents group membership of IPCAPS result.
node
represents node numbers of IPCAPS result.
label
represents labels of rows in orginal input data.
row.number
represents row numbers of orginal input data.
(Internal function) Perform the pre-processing step of IPCAPS
preprocess( files, label.file, lab.col, rdata.infile, bed.infile, cate.list, result.dir, threshold, min.fst, max.thread = NA, reanalysis = FALSE, method = "mix", min.in.group = 20, datatype = "snp", nonlinear = FALSE, missing.char = NA, regression.file = NA, regression.col.first = NA, regression.col.last = NA, reg.method = "linear", plot.as.pdf = NA, no.plot = NA )
preprocess( files, label.file, lab.col, rdata.infile, bed.infile, cate.list, result.dir, threshold, min.fst, max.thread = NA, reanalysis = FALSE, method = "mix", min.in.group = 20, datatype = "snp", nonlinear = FALSE, missing.char = NA, regression.file = NA, regression.col.first = NA, regression.col.last = NA, reg.method = "linear", plot.as.pdf = NA, no.plot = NA )
files |
IPCAPS supports SNPs encoded as 0, 1 and 2 (dosage encoding). Rows represent SNPs and columns represent subjects. Each column needs to be separated by a space or a tab. A big text file should be divided into smaller files to load faster. For instance, to input 3 files, use as: files=c( 'input1.txt', 'input2.txt', 'input3.txt'). |
label.file |
An additional useful information (called "labels" in IPCAPS) related subject, for example, geographic location or disease phenotype. These labels (one at a time) are used in displaying the clustering outcome of IPCAPS. A label file must contain at least one column. However, it may contain more than one column in which case each column need to be separated by a space or a tab. |
lab.col |
The label in the label file to be used in the tree-like display of IPCAPS clustering results. |
rdata.infile |
In case of re-analysis, it is convenient to run IPCAPS using the file rawdata.RData generated by IPCAPS. This file contains a matrix of SNPs (raw.data) and a vector of labels (label). |
bed.infile |
A PLINK binary format consists of 3 files; bed, bim, and fam. To generate these files from PLINK, use option –make-bed. See more details at: http://zzz.bwh.harvard.edu/plink/data.shtml. |
cate.list |
(Unimplemented) A list of categorical input file (text). For instance, to input 3 files, use as: files=c('input1.txt', 'input2.txt', 'input3.txt'). |
result.dir |
To set an absolute path for IPCAPS output. If the specified output directory already exists, result files are saved in sub-directories cluster_out, cluster_out1, cluster_out2, etc. |
threshold |
Cutoff value for EigenFit. |
min.fst |
Minimum Fst between a pair of subgroups. |
max.thread |
(Require the parallelization patch) Maximum number of concurrent threads. |
reanalysis |
(Unimplemented) To specify whether it is re-analysis or not. If TRUE, it is re-analysis, otherwise it is not. Default = FALSE. |
method |
The internal clustering method. It can be set to "mix" (rubikclust & mixmod), "mixmod" (Lebret et al., 2015), "clara" (R: Clustering Large Applications), "pam" (R: Partitioning Around Medoids (PAM) Object), "meanshift" (Wang, 2016), "apcluster" (Bodenhofer et al., 2016), and "hclust" (R: Hierarchical Clustering). Default = "mix". |
min.in.group |
Minimum number of individuals to constitute a cluster or subgroup. |
datatype |
To specify whether the input data are 'snp' or other type. Defalut = 'snp'. |
nonlinear |
(Unimplemented) To specify whether linear or non-linear method is used for IPCAPS analysis. If TRUE, non-linear method is used, otherwise linear method is used. Default = FALSE. |
missing.char |
Symbol used for missing genotypes. Default = NA. |
regression.file |
A file of covariates; one covariate per column. SNPs can be adjusted for these covariates via regression modeling and residual computation. |
regression.col.first |
Refer to a covariate file, the first covariate to be considered as confounding variable. |
regression.col.last |
Refer to a covariate file, the last covariate to be considered as confounding variable. All the variables in between the cov.col.first and cov.col.last will be considered in the adjustment process. |
reg.method |
(Fixed) Specify a method for regression analysis. Default = 'linear'. |
plot.as.pdf |
To export plots as PDF. When omitted, plots are saved as PNG. |
no.plot |
No plot is generated if this option is TRUE. This option is useful when the system does not support X Windows in the unix based system. Default = FALSE. |
A data frame of input data.
(Internal function) Perform the iterative process for each node
process.each.node(node, work.dir)
process.each.node(node, work.dir)
node |
An integer representing the current node number which is being analyzed. |
work.dir |
A working directory. |
NULL
The raw.data is the simulated dataset which consists of 3,000 independent SNPs
and 1,004 individuals belonging to one of 5 populations (200 individuals
each) and 4 outlying individuals. The matrix raw.data
contains the number 0, 1,
and 2 representing SNP in additive coding. The pairwise genetic distance between
populations are listed below (see Balding, 1995):
pop1 | pop2 | pop3 | pop4 | pop5 | |
pop1 | 0.0040 | 0.0059 | 0.0085 | 0.0101 | |
pop2 | 0.0040 | 0.0055 | 0.0082 | 0.0099 | |
pop3 | 0.0059 | 0.0055 | 0.0104 | 0.0119 | |
pop4 | 0.0085 | 0.0082 | 0.0104 | 0.0139 | |
pop5 | 0.0101 | 0.0099 | 0.0119 | 0.0139 |
data(ipcaps_example)
data(ipcaps_example)
A matrix with 3,000 columns and 1,004 rows
Balding, D.J., and Nichols, R.A. (1995). A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96, 3-12.
(Internal function) Replace missing values by specified values, internally used for parallelization
replace.missing(X, missing = NA, rep)
replace.missing(X, missing = NA, rep)
X |
A vector of data |
missing |
The old characters representing a missing value, for example, 'NA' or '-' |
rep |
The new characters to replace the old characters, for example, 'NULL' or '-1' |
A vector of data with replaced character for miss values
Generate HTML file called 'tree_scree.html' from the result of
ipcaps
. This function is a part of workflow in
save.plots
. The clustering result is shown as a tree rendering
by the online Google Organizational Chart library. Note that the Internet is
required to view the HTML file.
save.eigenplots.html(output.dir)
save.eigenplots.html(output.dir)
output.dir |
A result directory as the |
After running, this function generates the file called 'tree_scree.html' in the same result directory. All plots are generated and saved as image files in the sub-directory 'images'.
NULL
save.html
,
save.plots
,
save.plots.cluster.html
,
and save.plots.label.html
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to generate HTML file save.eigenplots.html(my.cluster$output.dir)
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to generate HTML file save.eigenplots.html(my.cluster$output.dir)
Generate HTML file called 'tree_text.html' from the result of
ipcaps
. The clustering result is shown as a tree rendering by
the online Google Organizational Chart library. Note that the Internet is
required to view the HTML file.
save.html(output.dir)
save.html(output.dir)
output.dir |
A result directory as the |
After running, this function generates the file called 'tree_text.html' in the same result directory.
NULL
save.plots
,
save.plots.cluster.html
,
save.eigenplots.html
,
and save.plots.label.html
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to generate HTML file save.html(my.cluster$output.dir)
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to generate HTML file save.html(my.cluster$output.dir)
Generate HTML files and all image files (plots) from the result
of ipcaps
. The clustering result is shown as a tree rendering
by the online Google Organizational Chart library. Note that the Internet is
required to view the HTML files.
save.plots(output.dir)
save.plots(output.dir)
output.dir |
A result directory as the |
After running, this function generates all plots and saves as image files in the
sub-directory 'images'. It calls save.plots.cluster.html
,
save.eigenplots.html
, and save.plots.label.html
to generate all HTML files.
NULL
save.html
,
save.plots.cluster.html
,
save.eigenplots.html
,
and save.plots.label.html
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to generate all plots and HTML files save.plots.label.html(my.cluster$output.dir)
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to generate all plots and HTML files save.plots.label.html(my.cluster$output.dir)
Generate HTML file called 'tree_scatter_cluster.html' from the
result of ipcaps
. This function is a part of workflow in
save.plots
. The clustering result is shown as a tree rendering
by the online Google Organizational Chart library. Note that the Internet is
required to view the HTML file.
save.plots.cluster.html(output.dir)
save.plots.cluster.html(output.dir)
output.dir |
A result directory as the |
After running, this function generates the file called 'tree_scatter_cluster.html' in the same result directory. All plots are generated and saved as image files in the sub-directory 'images'.
NULL
save.html
,
save.plots
,
save.eigenplots.html
,
and save.plots.label.html
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to generate HTML file save.plots.cluster.html(my.cluster$output.dir)
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to generate HTML file save.plots.cluster.html(my.cluster$output.dir)
Generate HTML file called 'tree_scatter_label.html' from the
result of ipcaps
. This function is a part of workflow in
save.plots
. The clustering result is shown as a tree rendering
by the online Google Organizational Chart library. Note that the Internet is
required to view the HTML file.
save.plots.label.html(output.dir)
save.plots.label.html(output.dir)
output.dir |
A result directory as the |
After running, this function generates the file called 'tree_scatter_label.html' in the same result directory. All plots are generated and saved as image files in the sub-directory 'images'.
NULL
save.html
,
save.plots
,
save.plots.cluster.html
,
and save.eigenplots.html
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to generate HTML file save.plots.label.html(my.cluster$output.dir)
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) #Here, to generate HTML file save.plots.label.html(my.cluster$output.dir)
Detects top discriminators that contribute to group separation based on the fixation index (Fst).
top.discriminator( cluster.obj = NULL, group1, group2, bim.file, use.node.number = FALSE, num.top = 100, percentile = 0.9, use.percentile = FALSE, use.path = FALSE, result.path = NULL )
top.discriminator( cluster.obj = NULL, group1, group2, bim.file, use.node.number = FALSE, num.top = 100, percentile = 0.9, use.percentile = FALSE, use.path = FALSE, result.path = NULL )
cluster.obj |
The object which is returned from |
group1 |
To specify the first group number to be compared. (also see
|
group2 |
To specify the second group number to be compared. (also see
|
bim.file |
Option: In case that SNP information is not provided to
|
use.node.number |
To specify whether a group number or a node number is be used. If TRUE, a node nubmer is used instead. Default = FALSE. |
num.top |
A number of top Fst SNPs to be returned. This parameter is
used when |
percentile |
A percentile for top Fst SNPs to be returned. This parameter
is used when |
use.percentile |
A logical value to indicate whether |
use.path |
A logical value to indicate whether |
result.path |
A path to an result directory of IPCAPS. This parameter is
used when |
The returned value is a data.frame of SNP information sorting by Fst in descending order, which contains 7 columns, chr, SNP, centimorgans, position, allele1, allele2, and Fst. The column 1-6 are SNP information from the bim file. The column Fst contains estimated Fst between group1 and group2.
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) table(my.cluster$cluster$label,my.cluster$cluster$group) # 1 2 3 4 5 6 # outlier4 5 4 1 0 0 0 # pop1 0 0 0 0 250 0 # pop2 0 0 0 0 0 250 # pop3 0 0 0 250 0 0 #Identify top discriminators between groups, for example, group 4 and group 5 top.snp <-top.discriminator(my.cluster,4,5) #or, specify the bim file #top.snp <-top.discriminator(my.cluster,4,5,bim.file="ipcaps_example.bim") head(top.snp) # chr SNP centimorgans position allele1 allele2 Fst #V5452 1 marker5452 0 54520000 A T 0.11337260 #V2348 1 marker2348 0 23480000 A T 0.11194490 #V8244 1 marker8244 0 82440000 A T 0.09556580 #V5972 1 marker5972 0 59720000 A T 0.08747794 #V3561 1 marker3561 0 35610000 A T 0.08725860 #V8419 1 marker8419 0 84190000 A T 0.08293494 #Alternatively, specify the previous result directory of IPCAPS and identify #top discriminators between groups, for example, group 4 and group 5 previous.res.path <- my.cluster$output.dir top.snp <-top.discriminator(result.path = previous.res.path, use.path = TRUE, group1 = 4, group2 = 5) head(top.snp) #Identify top discriminators between groups, for example, group 4 and group 5 top.snp <-top.discriminator(my.cluster,4,5) #or, specify the bim file #top.snp <-top.discriminator(my.cluster,4,5,bim.file="ipcaps_example.bim") dim(top.snp) head(top.snp) #Here, it is possible to select the top Fst SNPs based on a percentile. top.snp <-top.discriminator(my.cluster,4,5, percentile = 0.9, use.percentile = TRUE) dim(top.snp) head(top.snp) #Identify top discriminators between groups, for example, node 7 and node 8 top.snp2 <-top.discriminator(my.cluster,7,8,use.node.number=TRUE) head(top.snp2) # chr SNP centimorgans position allele1 allele2 Fst #V5452 1 marker5452 0 54520000 A T 0.11337260 #V2348 1 marker2348 0 23480000 A T 0.11194490
# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir()) table(my.cluster$cluster$label,my.cluster$cluster$group) # 1 2 3 4 5 6 # outlier4 5 4 1 0 0 0 # pop1 0 0 0 0 250 0 # pop2 0 0 0 0 0 250 # pop3 0 0 0 250 0 0 #Identify top discriminators between groups, for example, group 4 and group 5 top.snp <-top.discriminator(my.cluster,4,5) #or, specify the bim file #top.snp <-top.discriminator(my.cluster,4,5,bim.file="ipcaps_example.bim") head(top.snp) # chr SNP centimorgans position allele1 allele2 Fst #V5452 1 marker5452 0 54520000 A T 0.11337260 #V2348 1 marker2348 0 23480000 A T 0.11194490 #V8244 1 marker8244 0 82440000 A T 0.09556580 #V5972 1 marker5972 0 59720000 A T 0.08747794 #V3561 1 marker3561 0 35610000 A T 0.08725860 #V8419 1 marker8419 0 84190000 A T 0.08293494 #Alternatively, specify the previous result directory of IPCAPS and identify #top discriminators between groups, for example, group 4 and group 5 previous.res.path <- my.cluster$output.dir top.snp <-top.discriminator(result.path = previous.res.path, use.path = TRUE, group1 = 4, group2 = 5) head(top.snp) #Identify top discriminators between groups, for example, group 4 and group 5 top.snp <-top.discriminator(my.cluster,4,5) #or, specify the bim file #top.snp <-top.discriminator(my.cluster,4,5,bim.file="ipcaps_example.bim") dim(top.snp) head(top.snp) #Here, it is possible to select the top Fst SNPs based on a percentile. top.snp <-top.discriminator(my.cluster,4,5, percentile = 0.9, use.percentile = TRUE) dim(top.snp) head(top.snp) #Identify top discriminators between groups, for example, node 7 and node 8 top.snp2 <-top.discriminator(my.cluster,7,8,use.node.number=TRUE) head(top.snp2) # chr SNP centimorgans position allele1 allele2 Fst #V5452 1 marker5452 0 54520000 A T 0.11337260 #V2348 1 marker2348 0 23480000 A T 0.11194490