Title: | From Biological Sequences to Multidimensional Scaling |
---|---|
Description: | Utilities dedicated to the analysis of biological sequences by metric MultiDimensional Scaling with projection of supplementary data. It contains functions for reading multiple sequence alignment files, calculating distance matrices, performing metric multidimensional scaling and visualizing results. |
Authors: | Julien Pele with Jean-Michel Becu, Rym Ben Boubaker, Herve Abdi, and Marie Chabbert |
Maintainer: | Marie Chabbert <[email protected]> |
License: | GPL |
Version: | 1.2.3 |
Built: | 2024-11-08 06:30:23 UTC |
Source: | CRAN |
The bios2mds
package is developed in the Bioinformatics team at Integrated
Neurovascular Biology Laboratory, UMR CNRS 6214 / INSERM 771, University of Angers - FRANCE.
This package is dedicated to the analysis of biological sequences by metric MultiDimensional Scaling (MDS) with projection of supplementary data. It contains functions for reading multiple sequence alignment (MSA) files, calculating distance matrices from MSA files, performing MDS analysis and visualizing results. The MDS analysis and visualization tools can be applied to any kind of data.
The main functionalities of bios2mds
are summarized below:
Several functions allow users to read multiple sequence alignment files and to compute matrices of distances between these sequences:
import.fasta
: reads a multiple sequence alignment in FASTA format.
import.msf
: reads a multiple sequence alignment in MSF format.
mat.dif
: computes a matrix of pairwise distances between sequences based on sequence difference.
mat.dis
: computes a matrix of pairwise distances between sequences based on sequence dissimilarity.
A function performs metric MDS analysis of a distance matrix between active elements with the option of projecting supplementary elements onto the active space.
mmds
: performs metric multidimensional scaling.
mmds.project
: performs projection of supplementary elements onto the active space.
Several functions are proposed to visualize results of metric MDS analysis:
scree.plot
: draws the scree plot of eigenvalues.
mmds.2D.plot
: draws a scatter plot of the MDS coordinates.
mmds.2D.multi
: draws a scatter plot of the MDS coordinates with projection of multiple groups of supplementary element.
mmds.3D.plot
: Displays a 3D plot of the MDS coordinates.
mmds.plot
: wrapper function that draws the scree plot and three scatter plots of MDS coordinates.
col.group
: colors scatter plots with user provided groupings and colors.
write.mmds.pdb
: writes MDS coordinates in a PDB formatted file for 3D visualisation.
Several functions allow users to perform data clustering and to assess the clustering robustness:
kmeans.run
: performs multiple runs of K-means clustering and analyzes clusters
sil.score
: calculates the silhouette score from multiple K-means runs to determine the optimal number of clusters.
extract.cluster
: extraction of clusters alignments
Two raw datasets are proposed to test the functionalities of bios2mds
. They correspond to the multiple sequence alignments of GPCRs from H. sapiens and D. melanogaster in .msf and .fa formats (msa/human_gpcr.* and msa/drome_gpcr.*). They are based on the non-redundant sets of non-olfactory class A GPCRs, prepared and analyzed in Deville et al. (2009) and updated with the July 2009 release of Uniprot http://www.uniprot.org.
Each MSA file is related to a .csv file that assigns a group and a color to each sequence of the alignment (csv/human_gpcr_group.csv and csv/drome_gpcr_group.csv).
Pre-analyzed data from these two MSA files are in gpcr
. Moreover, gpcr
contain projection of GPCRs from N. vectensis and from C. intestinalis onto the active space of human GPCRs calculated by MDS analysis.
For an index of functions, use library(help = bios2mds
).
Package | bios2mds |
Type | Package |
Version | 1.2.2 |
Repository | CRAN |
Date | 2012-06-01 |
License | GPL version 2 or newer |
Collate | Other useful packages can be found in the CRAN task view. |
See https://mirror.its.sfu.ca/mirror/CRAN/web/views/Multivariate.html | |
and https://mirror.its.sfu.ca/mirror/CRAN/web/views/Cluster.html |
Julien Pele <[email protected]> with Jean-Michel Becu <[email protected]>, Herve Abdi <[email protected]> and Marie Chabbert <[email protected]>.
Maintainer: Marie Chabbert <[email protected]>
citation('bios2mds')
# The MSA files provided with the package correspond to the sequence # alignment of non-olfactory class A G-protein-coupled receptors from # H. sapiens and D. melanogaster prepared by Deville et al. (2009). # loading GPCR data wd <- tempdir() file <- file.path(wd,"R.pdb") data(gpcr) # building distance matrices between the aligned GPCR sequences from # H. sapiens and D. melanogaster human <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) drome <- import.fasta(system.file("msa/drome_gpcr.fa", package = "bios2mds")) #active <- mat.dif(human, human) # or active <- gpcr$dif$sapiens.sapiens #sup <- mat.dif(drome, human) # or sup <- gpcr$dif$melanogaster.sapiens # performing MDS analysis of the GPCR sequences from H. sapiens mmds_active <- mmds(active, group.file=system.file( "csv/human_gpcr_group.csv",package = "bios2mds")) # performing MDS analysis of the GPCR sequences from H. sapiens # with projection of GPCRs from D. melanogaster # as supplementary elements onto the space of human GPCRs mmds_sup <- mmds.project(mmds_active, sup,system.file( "csv/drome_gpcr_group.csv",package = "bios2mds")) # displaying MDS coordinates layout(matrix(1:6, 2, 3)) scree.plot(mmds_active$eigen.perc, lab = TRUE, title = "Scree plot of metric MDS") mmds.2D <- mmds.2D.plot(mmds_active, title = "Sequence space of human GPCRs") mmds.2D.plot(mmds_active,mmds_sup, title = "Projection of GPCRs from D. melanogaster onto the space space of human GPCRs ", active.alpha = 0.3) # writing PDB files for 3D visualization of MDS coordinates write.mmds.pdb(mmds_active,file.pdb=file)
# The MSA files provided with the package correspond to the sequence # alignment of non-olfactory class A G-protein-coupled receptors from # H. sapiens and D. melanogaster prepared by Deville et al. (2009). # loading GPCR data wd <- tempdir() file <- file.path(wd,"R.pdb") data(gpcr) # building distance matrices between the aligned GPCR sequences from # H. sapiens and D. melanogaster human <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) drome <- import.fasta(system.file("msa/drome_gpcr.fa", package = "bios2mds")) #active <- mat.dif(human, human) # or active <- gpcr$dif$sapiens.sapiens #sup <- mat.dif(drome, human) # or sup <- gpcr$dif$melanogaster.sapiens # performing MDS analysis of the GPCR sequences from H. sapiens mmds_active <- mmds(active, group.file=system.file( "csv/human_gpcr_group.csv",package = "bios2mds")) # performing MDS analysis of the GPCR sequences from H. sapiens # with projection of GPCRs from D. melanogaster # as supplementary elements onto the space of human GPCRs mmds_sup <- mmds.project(mmds_active, sup,system.file( "csv/drome_gpcr_group.csv",package = "bios2mds")) # displaying MDS coordinates layout(matrix(1:6, 2, 3)) scree.plot(mmds_active$eigen.perc, lab = TRUE, title = "Scree plot of metric MDS") mmds.2D <- mmds.2D.plot(mmds_active, title = "Sequence space of human GPCRs") mmds.2D.plot(mmds_active,mmds_sup, title = "Projection of GPCRs from D. melanogaster onto the space space of human GPCRs ", active.alpha = 0.3) # writing PDB files for 3D visualization of MDS coordinates write.mmds.pdb(mmds_active,file.pdb=file)
Links elements in a mmds
object or mmds.project
object to user-provided groups and colors.
col.group(x,file)
col.group(x,file)
x |
a 'mmds' or 'project' object obtained from |
file |
a string of characters to indicate the file name assigning groups and colors to each active OR each supplementary element of the mmds object. |
col.group
assigns each element of the mmds object to user-provided groupings and colors for coloring and labeling mmds scatter plots.
col.group
requires a formatted file. See "csv/human_gpcr_group.csv" for an example. Each line corresponds to one element of the mmds object and must contain three parameters separated by ",".
The first parameter is the element name, as given in the multiple sequence alignment file.
The second parameter is the group name. Groupings must be provided by the user.
The third parameter is the group color in full letters (example : "black","green"). Two or more groups can have the same color, but elements within the same group must have the same color. The group is colored by the first color encountered.
Adds data to a mmds object in order to color and label mmds scatter plots with user-provided groupings and colors.
Jean-Michel Becu
See colors
function (default R package).
See getcol
in made4
package for special color palette developed to maximize the contrast between colors.
# performing metric MDS on human GPCRs with projection of # GPCRs from D. melanogaster as supplementary data: data(gpcr) active <- gpcr$dif$sapiens.sapiens mmds_active <- mmds(active) mmds_active<-col.group(mmds_active,system.file("csv/human_gpcr_group.csv" ,package = "bios2mds"))
# performing metric MDS on human GPCRs with projection of # GPCRs from D. melanogaster as supplementary data: data(gpcr) active <- gpcr$dif$sapiens.sapiens mmds_active <- mmds(active) mmds_active<-col.group(mmds_active,system.file("csv/human_gpcr_group.csv" ,package = "bios2mds"))
Measures the difference score between two aligned amino acid or nucleotide sequences.
dif(seq1, seq2, gap = FALSE, aa.strict = FALSE)
dif(seq1, seq2, gap = FALSE, aa.strict = FALSE)
seq1 |
a character vector representing a first sequence. |
seq2 |
a character vector representing a second sequence. |
gap |
a boolean indicating whether the gap character should be taken as a supplementary symbol (TRUE) or not (FALSE). Default is FALSE. |
aa.strict |
a boolean indicating whether only strict amino acids should be taken into account (TRUE) or not (FALSE). Default is FALSE. |
The difference score between two aligned sequences is given by the proportion of sites that differs and is equivalent to (percent identity).
dif
is given by the number of aligned positions (sites) whose symbols differ, divided by the number of aligned positions. dif
is equivalent to the p distance defined by Nei and Zhang (2006).
In dif
, positions with at least one gap can be excluded (gap = FALSE). When gaps are taken as a supplementary symbol (gap = TRUE), sites with gaps in both sequences are excluded.
From Nei and Zhang (2006), the p distance, which is the proportion of sites that differ between two sequences, is estimated by:
where n is the number of sites and is the number of sites with different symbols.
The difference score ranges from 0, for identical sequences, to 1, for completely different sequences.
A single numeric value representing the difference score.
Julien Pele
May AC (2004) Percent sequence identity: the need to be explicit. Structure 12:737-738.
Nei M and Zhang J (2006) Evolutionary Distance: Estimation. Encyclopedia of Life Sciences.
Nei M and Kumar S (2000) Molecular Evolution and Phylogenetics. Oxford University Press, New York.
# calculating the difference score between the sequences # of CLTR1_HUMAN and CLTR2_HUMAN: aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) dif <- dif(aln$CLTR1_HUMAN, aln$CLTR2_HUMAN) dif
# calculating the difference score between the sequences # of CLTR1_HUMAN and CLTR2_HUMAN: aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) dif <- dif(aln$CLTR1_HUMAN, aln$CLTR2_HUMAN) dif
Computes the dissimilarity score between two aligned amino acid sequences, based on substitution matrices.
dis(seq1, seq2, sub.mat.id = "PAM250", gap = NULL)
dis(seq1, seq2, sub.mat.id = "PAM250", gap = NULL)
seq1 |
a character vector representing a first amino acid sequence. |
seq2 |
a character vector representing a second amino acid sequence. |
sub.mat.id |
a string of characters indicating the amino acid substitution matrix to be taken into
account for calculation. This should be one of "PAM40", "PAM80", "PAM120", "PAM160", "PAM250", "BLOSUM30", "BLOSUM45", "BLOSUM62", "BLOSUM80", "GONNET", "JTT", "JTT_TM" and "PHAT". Default is PAM250. See |
gap |
a numeric vector of length 2, indicating the penalty for initiating and extending a strand of two gaps (gap ignored, default). |
Grishin and Grishin (2002) developped a method to calculate the similarity score with amino acid substitution matrices.
Let s be an amino acid substitution matrix with elements s(a, b),
let A be an alignment of n sequences, is a symbol (amino acid
or gap: '-') in the site k of the sequence i. For each pair of sequences i and
j from A, the following equations(1), (2), (3) and (4) are calculated as follows:
, called score per site, is obtained as:
where is the set of sites k such that
'-' and
'-'
and
is the number of elements in
.
, called average upper limit of the score per site, is obtained as:
, called score per site expected from random sequences, is obtained as:
where is the frequency of amino acid 'a' in
protein sequence of A
over all sites in
.
, called normalized score (Feng and Doolittle, 1997), is obtained as:
The normalized score ranges from 0 (for random sequences) to 1 (for identical
sequences). However, for very divergent sequences,
can become negative due to
statistical errors. In this case,
dis
attributes 0 to negative scores.
The dissimilarity score between sequences i and j is obtained from the similarity score as:
A single numeric value representing the dissimilarity score.
Julien Pele
Grishin VN and Grishin NV (2002) Euclidian space and grouping of biological objects. Bioinformatics 18:1523-1534.
Feng DF and Doolittle RF (1997) Converting amino acid alignment scores into measures of evolutionary time: a simulation study of various relationships. J Mol Evol 44:361-370.
# calculating dis between the sequences of CLTR1_HUMAN and CLTR2_HUMAN: aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) dis <- dis(aln$CLTR1_HUMAN, aln$CLTR2_HUMAN) dis
# calculating dis between the sequences of CLTR1_HUMAN and CLTR2_HUMAN: aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) dis <- dis(aln$CLTR1_HUMAN, aln$CLTR2_HUMAN) dis
Writes a multiple sequence alignment (MSA) file in FASTA format.
export.fasta(x, outfile = "alignment.fa", ncol = 60, open = "w")
export.fasta(x, outfile = "alignment.fa", ncol = 60, open = "w")
x |
an object of class 'align', obtained from |
outfile |
a string of characters or string vector to indicate the name of the MSA file(s) to be written. If x is an object of class 'align', default is "alignment.fa". If x is an element list, for each element in x, default is the element name, followed by the ".alignement.fa" extension. |
ncol |
an integer value indicating the number of characters per line for the sequences in outfile.Default is 60. |
open |
a character value indicating the opening mode for outfile. This should be either of "w" to write into a new file, or "a" to append at the end of an already existing file. Default is "w". |
Initially, FASTA (for FAST-ALL) was the input format of the FASTA program, used for protein comparison and searching in databases. Presently, FASTA format is a standard format for biological sequences.
The FASTA formatted file of a single sequence displays:
a single-line description beginning with a greater-than (>) symbol. The following word is the identifier.
followed by any number of lines, representing biological sequence.
For multiple alignments, the FASTA formatted sequences are concatenated to create a multiple FASTA format.
Produces a FASTA file for an 'align' object or a FASTA file for each cluster in list.
For further information about FASTA format, see: http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml
Jean-Michel Becu
write.fasta
function from seqinr
package.
# reading of the multiple sequence alignment of human GPCRS in FASTA format: wd <- tempdir() #wd <- getwd() file1 <- file.path(wd,"alignment.fa") aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) export.fasta(aln,file1)
# reading of the multiple sequence alignment of human GPCRS in FASTA format: wd <- tempdir() #wd <- getwd() file1 <- file.path(wd,"alignment.fa") aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) export.fasta(aln,file1)
Extracts the multiple sequence alignement of each cluster after K-means clustering.
extract.cluster(x, align)
extract.cluster(x, align)
x |
an object of class 'kmean', obtained from |
align |
an object of class 'align', obtained from |
Extraction of the MSA of each cluster.
A named list of 'align' objects.
Jean-Michel Becu
# Clustering human GPCRs in 4 groups with 100 runs of K-means # and extraction of the alignment of each cluster aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) data(gpcr) kmeans <- kmeans.run(gpcr$mmds$sapiens.active$coord, nb.clus = 4, nb.run = 100) clusAlign <- extract.cluster(kmeans,aln)
# Clustering human GPCRs in 4 groups with 100 runs of K-means # and extraction of the alignment of each cluster aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) data(gpcr) kmeans <- kmeans.run(gpcr$mmds$sapiens.active$coord, nb.clus = 4, nb.run = 100) clusAlign <- extract.cluster(kmeans,aln)
This data set was obtained by the bios2mds
analysis of the two multiple sequence alignment files provided in /msa. The MSA files were prepared as previously described by Deville et al. (2009) and updated with the July 2009 release of Uniprot (http://www.uniprot.org). They correspond to non-redundant sets of non-olfactory class A G-protein-coupled receptors (GPCRs) from H. sapiens and D. melanogaster (283 and 59 sequences, respectively). The data sets from H. sapiens and D. melanogaster constitute the active and supplementary data sets, respectively.
data(gpcr)
data(gpcr)
gpcr
is a named list of three elements:
a named list containing two distance matrices calculated from the mat.dif
function (distances based on difference scores) with default parameters:
a 283 by 283 matrix of difference scores between the 283 aligned sequences of human GPCRs.
a 59 by 283 matrix of difference scores between the aligned sequences of GPCRs from H. sapiens and D. melanogaster (283 and 59 sequences, respectively)..
a named list containing sixteen distance matrices calculated from the mat.dis
function (distances based on dissimilarity scores) for the eight substitution matrices in sub.mat
(other parameters by default):
are 283 by 283 matrices of dissimilarity scores between the 283 aligned sequences of human GPCRs.
is calculated with PAM40.
is calculated with PAM80.
is calculated with PAM120.
is calculated with PAM250.
is calculated with BLOSUM30.
is calculated with BLOSUM45.
is calculated with BLOSUM62.
is calculated with BLOSUM80.
is calculated with GONNET.
is calculated with JTT.
is calculated with JTT_TM.
is calculated with PHAT.
are 59 by 283 matrices of dissimilarity scores between the 283 aligned sequences from H. sapiens and D. melanogaster
is calculated with PAM40.
is calculated with PAM80.
is calculated with PAM120.
is calculated with PAM250.
is calculated with BLOSUM30.
is calculated with BLOSUM45.
is calculated with BLOSUM62.
is calculated with BLOSUM80.
is calculated with GONNET.
is calculated with JTT.
is calculated with JTT_TM.
is calculated with PHAT.
a typical example of metric MDS analysis with the mmds
function of bios2mds
using GPCRs from H. sapiens as active data and GPCRs from D. melanogaster, N. vectensis and C. intestinalis for supplementary data.
metric MDS analysis of active data from H. sapiens.
projection of supplementary data from D. melanogaster onto the human active space.
projection of supplementary data from N. vectensis onto active the human space.
projection of supplementary data from C. intestinalis onto the human active space.
Deville J, Rey J and Chabbert M (2009) An indel in transmembrane helix 2 helps to trace the molecular evolution of class A G-protein-coupled receptors. J Mol Evol 68: 475- 489.
# loading gpcr data(gpcr) # displaying the matrix of differences scores between GPCRs sequences # from H. sapiens gpcr$dif$sapiens.sapiens # displaying the matrix of dissimilarity scores between GPCRs sequences # from H. sapiens gpcr$dis$sapiens.sapiens$PAM250 # displaying the matrix of dissimilarity scores between GPCRs sequences # from H. sapiens and D. Melanogaster calculated with the BLOSUM45 matrix gpcr$dis$melanogaster.sapiens$BLOSUM45 # displaying mmds analysis of the MSA of GPCRs from H. sapiens # and D. Melanogaster gpcr$mmds$sapiens.active gpcr$mmds$melanogaster.project
# loading gpcr data(gpcr) # displaying the matrix of differences scores between GPCRs sequences # from H. sapiens gpcr$dif$sapiens.sapiens # displaying the matrix of dissimilarity scores between GPCRs sequences # from H. sapiens gpcr$dis$sapiens.sapiens$PAM250 # displaying the matrix of dissimilarity scores between GPCRs sequences # from H. sapiens and D. Melanogaster calculated with the BLOSUM45 matrix gpcr$dis$melanogaster.sapiens$BLOSUM45 # displaying mmds analysis of the MSA of GPCRs from H. sapiens # and D. Melanogaster gpcr$mmds$sapiens.active gpcr$mmds$melanogaster.project
Reads a Multiple Sequence Alignment (MSA) file in FASTA format (.fasta or .fa extension).
import.fasta(file, aa.to.upper = TRUE, gap.to.dash = TRUE)
import.fasta(file, aa.to.upper = TRUE, gap.to.dash = TRUE)
file |
a string of characters to indicate the name of the MSA file to be read. |
aa.to.upper |
a logical value indicating whether amino acids should be converted to upper case (TRUE) or not (FALSE). Default is TRUE. |
gap.to.dash |
a logical value indicating whether the dot (.) and tilde ( |
Initially, FASTA (for FAST-ALL) was the input format of the FASTA program, used for protein comparison and searching in databases. Presently, FASTA format is a standard format for biological sequences.
The FASTA formatted file of a single sequence displays:
a single-line description beginning with a greater-than (>) symbol. The following word is the identifier.
followed by any number of lines, representing biological sequence.
For multiple alignments, the FASTA formatted sequences are concatenated to create a multiple FASTA format.
A object of class 'align', which is a named list whose elements correspond to sequences, in the form of character vectors.
For further information about FASTA format, see: http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml
Julien Pele
Pearson WR and Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 27:2444-2448.
read.fasta
function from bio3d
package.read.fasta
function from seqinr
package.read.FASTA
function from aaMI
package (archived).
# reading of the multiple sequence alignment of human GPCRS in FASTA format: aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds"))
# reading of the multiple sequence alignment of human GPCRS in FASTA format: aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds"))
Reads a Multiple Sequence Alignment (MSA) file in MSF format (.msf extension).
import.msf(file, aa.to.upper = TRUE, gap.to.dash = TRUE)
import.msf(file, aa.to.upper = TRUE, gap.to.dash = TRUE)
file |
a string of characters to indicate the name of the MSA file to be read. |
aa.to.upper |
a logical value indicating whether amino acids should be converted to upper case (TRUE) or not (FALSE). Default is TRUE. |
gap.to.dash |
a logical value indicating whether the dot (.) and tilde ( |
Initially, Multiple Sequence Format (MSF) was the multiple sequence alignment format of the Wisconsin Package (WP) or GCG (Genetic Computer Group). This package is a suite of over 130 sequence analysis programs for database searching, secondary structure prediction or sequence alignment. Presently, numerous multiple sequence alignment editors (Jalview and GeneDoc for example) can read and write MSF files.
MSF file displays several specificities:
a header containing sequence identifiers and characteristics (length, check and weight).
a separator symbolized by 2 slashes (//).
sequences of identifiers, displayed by consecutive blocks.
A object of class 'align', which is a named list whose elements correspond to sequences, in the form of character vectors.
import.msf
checks the presence of duplicated identifiers in header. Sequences whose
identifiers are missing in header are ignored.
Julien Pele
read.alignment
function from seqinr
package.read.GDoc
function from aaMI
package (archived).
# reading of the multiple sequence alignment of human GPCRs in MSF format: aln <- import.msf(system.file("msa/human_gpcr.msf", package = "bios2mds"))
# reading of the multiple sequence alignment of human GPCRs in MSF format: aln <- import.msf(system.file("msa/human_gpcr.msf", package = "bios2mds"))
Performs multiple runs of K-means clustering and analyzes data.
kmeans.run(mat, nb.clus = 2, nb.run = 1000, iter.max = 10000, method = "euclidean")
kmeans.run(mat, nb.clus = 2, nb.run = 1000, iter.max = 10000, method = "euclidean")
mat |
a numeric matrix representing the coordinates of the elements after metric MDS analysis. |
nb.clus |
a numeric value indicating the number of clusters. Default is 2. |
nb.run |
a numeric value indicating the number of runs. Default is 1000. |
iter.max |
a numeric value indicating the maximum number of iterations for K-means. Default is 10000. |
method |
a string of characters to determine the distance to be used. This should be one of "euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "correlation", "spearman" or "kendall". Default is "euclidean". |
The aim of K-means clustering is the partition of elements into a user-provided number of clusters. Several runs of K-means analysis on the same data may return different cluster assignments because the K-means procedure attributes random initial centroids for each run. The robustness of an assignment depends on its reproducibility.
The function matchClasses
from the e1071
package is used to compare the cluster assignments of the different runs and returns a score of agreement between them. The most frequent clustering solution is selected and used as a reference to assess the reproducibility of the analysis.
kmeans.run
returns two lists. In either list, the clusters refer to those observed in the most frequent solution. The first list provides, for each element, the relative ratio of its assignment to each cluster in the different runs. The second list provides, for each cluster, the list of the assigned elements along with the relative assignment to this cluster in the different runs.
A object of class 'kmean', which is a named list of two elements
elements |
a named list of elements with the relative assignment of each element to each cluster. |
clusters |
a named list of clusters with the elements assigned to this cluster in the most frequent solution and their relative assignment to this cluster in multiple runs. |
During the K-means procedure, an empty cluster can be obtained if no objects are allocated to the
cluster. In kmeans.run
, runs with empty clusters are discarded.
kmeans.run
requires Kmeans
and matchClasses
functions from amap
and e1071
packages, respectively.
Julien Pele
# Clustering human GPCRs in 4 groups with 100 runs of K-means data(gpcr) coord <- gpcr$mmds$sapiens.active$coord kmeans.run1 <- kmeans.run(coord, nb.clus = 4, nb.run = 100) kmeans.run1$clusters kmeans.run1$elements
# Clustering human GPCRs in 4 groups with 100 runs of K-means data(gpcr) coord <- gpcr$mmds$sapiens.active$coord kmeans.run1 <- kmeans.run(coord, nb.clus = 4, nb.run = 100) kmeans.run1$clusters kmeans.run1$elements
Computes a matrix providing the distances based on the difference scores between sequences from two multiple sequence alignments.
mat.dif(align1, align2, gap = FALSE, aa.strict = FALSE, sqrt = FALSE)
mat.dif(align1, align2, gap = FALSE, aa.strict = FALSE, sqrt = FALSE)
align1 |
a list of character vectors representing a first multiple sequence aligment. |
align2 |
a list of character vectors representing a second multiple sequence aligment. |
gap |
a logical value indicating whether gap character should be taken as supplementary symbol (TRUE) or not (FALSE). Default is FALSE. |
aa.strict |
a logical value indicating whether only strict amino acids should be taken into account (TRUE) or not (FALSE). To be used only for amino acid sequences. Default is FALSE. |
sqrt |
a logical value indicating whether the distance should be equal to the squared root of the difference score (TRUE) or not (FALSE). Default is FALSE. |
If align1
and align2
are identical, mat.dif
computes the symetrical matrix of distances between each sequence of the alignment.
Before using mat.dif
, users must check the alignment of sequences within align1
and align2
and between align1
and align2
.
A named numeric matrix providing the difference-based distances between each pair of sequences from align1
and align2
. The number of rows and columns is identical to the number of sequences in align1
and align2
, respectively.
Julien Pele and Jean-Michel Becu
identity
function from bio3d
package.
# calculating the matrix of distances based on the difference scores # between GPCRs sample from H. sapiens and D. melanogaster: aln_human <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) aln_drome <- import.fasta(system.file("msa/drome_gpcr.fa", package = "bios2mds")) mat.dif1 <- mat.dif(aln_human[1:5], aln_drome[1:5]) mat.dif1
# calculating the matrix of distances based on the difference scores # between GPCRs sample from H. sapiens and D. melanogaster: aln_human <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) aln_drome <- import.fasta(system.file("msa/drome_gpcr.fa", package = "bios2mds")) mat.dif1 <- mat.dif(aln_human[1:5], aln_drome[1:5]) mat.dif1
Computes a matrix providing the distances based on dissimilarity scores between sequences from two multiple sequence alignments.
mat.dis(align1, align2, sub.mat.id = "PAM250", sqrt=FALSE)
mat.dis(align1, align2, sub.mat.id = "PAM250", sqrt=FALSE)
align1 |
a list of character vectors representing a first multiple sequence aligment. |
align2 |
a list of character vectors representing a second multiple sequence aligment. |
sub.mat.id |
a string of characters indicating the amino acid substitution matrix used for calculation
of the dissimilarity score. This should be one of "PAM40", "PAM80", "PAM120", "PAM160", "PAM250", "BLOSUM30", "BLOSUM45", "BLOSUM62", "BLOSUM80", "GONNET", "JTT", "JTT_TM" and "PHAT".
The supported substitution matrices are in |
sqrt |
a logical value indicating whether the distance should be equal to the squared root of the difference score (TRUE) or not (FALSE). Default is FALSE. |
The dissimilarity score between a sequence i from align1
and a sequence j from align2
is calculated with an amino acid substitution matrix from sub.mat
.
If align1
and align2
are identical, mat.dis
computes the symetrical matrix of distances between each sequence of the alignment.
Before using mat.dis
, users must check the alignment of sequences within align1
and align2
and between align1
and align2
.
A named numeric matrix providing the dissimilarity-based distances between each pair of sequences from align1
and align2
, based on the substitution matrix sub.mat.id
. The number of rows and columns is identical to the number of sequences in align1
and align2
, respectively.
Julien Pele and Jean-Michel Becu
# calculating dissimilarity distances between GPCR sequences sample from #H. sapiens and D. melanogaster, based on the PAM250 matrix: aln_human <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) aln_drome <- import.fasta(system.file("msa/drome_gpcr.fa", package = "bios2mds")) mat.dis1 <- mat.dis(aln_human[1:5], aln_drome[1:5]) mat.dis1 # calculating dissimilarity distances between GPCRs sequences sample from #H. sapiens and D. melanogaster, based on the BLOSUM45 matrix: aln_human <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) aln_drome <- import.fasta(system.file("msa/drome_gpcr.fa", package = "bios2mds")) mat.dis1 <- mat.dis(aln_human[1:5], aln_drome[1:5], sub.mat.id = "BLOSUM45") mat.dis1
# calculating dissimilarity distances between GPCR sequences sample from #H. sapiens and D. melanogaster, based on the PAM250 matrix: aln_human <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) aln_drome <- import.fasta(system.file("msa/drome_gpcr.fa", package = "bios2mds")) mat.dis1 <- mat.dis(aln_human[1:5], aln_drome[1:5]) mat.dis1 # calculating dissimilarity distances between GPCRs sequences sample from #H. sapiens and D. melanogaster, based on the BLOSUM45 matrix: aln_human <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) aln_drome <- import.fasta(system.file("msa/drome_gpcr.fa", package = "bios2mds")) mat.dis1 <- mat.dis(aln_human[1:5], aln_drome[1:5], sub.mat.id = "BLOSUM45") mat.dis1
Performs metric MultiDimensional Scaling (MDS) analysis of active elements.
mmds(active, pc = 3, group.file = NULL)
mmds(active, pc = 3, group.file = NULL)
active |
a numeric matrix of distances between active elements. |
pc |
a numeric value indicating the number of principal components to be saved. Default is 3. |
group.file |
a string of characters to indicate the file name assigning groups and colors to each active element. |
Metric multidimensional scaling is a statistical analysis technique aimed at analyzing a matrix of distances between 'active' elements. MDS maps the elements onto a low dimensional space (usually 2D or 3D), In this space, elements are represented by points whose respective distances best approximate the initial distance.
active
must have some characteristics:
active
represents the matrix of pairwise distances between active elements. The active matrix must be symmetric (square and equals to its transpose). The distances on the main diagonal must be equal to 0. The distances do not have to be Euclidean. They can just express a difference or dissimilarity score, with a 0 value between same elements and a positive value between different elements.
The method for the computation of metric MDS is described by Abdi (2007). Briefly, if N is the number of active sequences and D is the N by N matrix of the squared distances computed from the active matrix, the mmds
function performs the following steps:
Transforms D into a cross-product matrix S:
where I is the N by N identity matrix, m is the N by 1 matrix mass, where each mass equal to and 1 is an N by N matrix of ones.
Transforms S into a factor matrix F:
where M is the diag{m}, U is the eigenvector matrix and is the diagonal matrix of the eigenvalues, such as S =
, where
denotes the transposition operation.
The eigenvectors of S, also called principal components (whose number is smaller or equal to N), form the active space. F gives the coordinates of the active elements in this space.
A object of class 'mmds', which is a named list of five elements:
eigen |
a numeric vector of the eigenvalues. |
eigen.perc |
a numeric vector of the relative eigenvalues (eigenvalues divided by the sum of the absolute eigenvalues). |
coord |
a named numeric matrix representing the coordinates of active elements. |
group |
a named string matrix representing the differents groups of elements and the associate color (Default is 'NoGroup' and 'black'). |
col |
a named string matrix representing, foreach named elements, the associate group and color (Default is 'NoGroup' and 'black'). |
source |
a named list with 2 elements, the matrix (D) of squared distance and the vector of mass m, that will used for projection of supplementary elements, if required, with the |
If active
do not contain names:
A tag "A" followed by an incremented number names the rows and the columns of active
.
Julien Pele and Jean-Michel Becu
Abdi H (2007) Metric multidimensional scaling. In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. pp. 598-605.
For further information on multidimentional scaling:
Takane Y, Jung S and Oshima-Takane Y (2009) Multidimensional scaling, in Handbook of quantitative methods in psychology, eds Millsap R, Maydeu-Olivares A (Sage Publications, London) pp. 219-242.
Borg I and Groenen PJF (2005) Modern multidimensional scaling. New York : Springer.
Gower JC (1967) A comparison of some methods of cluster analysis. Biometrics 23:623-637.
ToRgerson WS (1958) Theory and methods of scaling. New York : Wiley.
cmdscale
function from stats
package.dudi.pco
, suprow
and supcol
functions from ade4
package.PCA
function from FactoMineR
package.
# performing metric MDS of human GPCRs with projection of # GPCRs from D. melanogaster as supplementary elements: data(gpcr) active <- gpcr$dif$sapiens.sapiens mmds1 <- mmds(active = active) mmds1$active.coord
# performing metric MDS of human GPCRs with projection of # GPCRs from D. melanogaster as supplementary elements: data(gpcr) active <- gpcr$dif$sapiens.sapiens mmds1 <- mmds(active = active) mmds1$active.coord
Displays a scatter plot of the active elements and the barycenter of supplementary elements, or of groups of supplementary elements, after a metric MDS analysis.
mmds.2D.multi(x,project, title = NULL, axis = c(1, 2), xlim = NULL, ylim = NULL, outfile.type = NULL,bary="p", outfile.name = "mmds",new.plot = TRUE, active.col = x$col[,3], active.alpha = 0.6, active.pch = 20, sup.pch = NULL , active.cex = 2, sup.cex = 2, active.legend.cex = 2, sup.legend.cex = 2, active.legend.lwd = 1, sup.legend.lwd = 2, active.lwd = 1, sup.lwd = 3, legend = TRUE, active.legend.pos = "bottomleft", sup.legend.pos = "bottomright", group.name = NULL, ensemble.legend.name = "", group.col = NULL, outfile.width = NULL, outfile.height = NULL, box.lwd = 1, cex.axis = 1, sup.legend.text = 1, active.legend.text = 1, legend.axis = TRUE, grid = TRUE, axes = TRUE)
mmds.2D.multi(x,project, title = NULL, axis = c(1, 2), xlim = NULL, ylim = NULL, outfile.type = NULL,bary="p", outfile.name = "mmds",new.plot = TRUE, active.col = x$col[,3], active.alpha = 0.6, active.pch = 20, sup.pch = NULL , active.cex = 2, sup.cex = 2, active.legend.cex = 2, sup.legend.cex = 2, active.legend.lwd = 1, sup.legend.lwd = 2, active.lwd = 1, sup.lwd = 3, legend = TRUE, active.legend.pos = "bottomleft", sup.legend.pos = "bottomright", group.name = NULL, ensemble.legend.name = "", group.col = NULL, outfile.width = NULL, outfile.height = NULL, box.lwd = 1, cex.axis = 1, sup.legend.text = 1, active.legend.text = 1, legend.axis = TRUE, grid = TRUE, axes = TRUE)
x |
an object of class 'mmds', obtained from |
project |
an object of class 'project' or a list of objects of class 'project', obtained from the |
title |
a string of characters representing the title of the plot. Default is "Metric MDS". |
bary |
a character to specify the representation of group barycenter with "p" for representation of group barycenter foreach 'project' and active space by dots and "l" for representation of barycenter linked foreach group in order of appearence in project list and with the barycenter of active group in last position. Default is "p". |
axis |
a numeric vector of length two representing the principal components displayed on the plot. Default is c(1, 2). |
xlim |
a numeric vector representing the range for the x values. Default is full range. |
ylim |
a numeric vector representing the range for the y values. Default is full range. |
outfile.type |
a string indicating the extension type of the graph outfile. Default is NULL. If not NULL, this should be one of "pdf", "tiff", "png" or "postscript". In this case, the parameter outfile.name and the forthcoming parameters are activated. |
outfile.name |
a string ("mmds", default) indicating the name and directory of pdf graph outfile. The extension file is added automaticaly. See |
new.plot |
a boolean indicating whether a new graphical device should be created (TRUE) or not (FALSE). Default is TRUE. |
active.pch |
an integer indicating the symbol of active elements. Default is 20, corresponding to dots. |
sup.pch |
an integer vector representing the symbol for project barycenter. Default is NULL. If sup.pch is NULL then symbol are choose randomly. |
active.col |
a string of characters or character vector representing the color(s) of the active elements. Default is x$col[,3].
It corresponds either to the user-provided colors if the |
active.alpha |
a numeric value indicating the alpha value for opacity of active objects. This value must range from 0 (invisible) to 1 (full opacity). Default is 0.6. |
active.cex |
a numeric value indicating the size of the active symbols. Default is 2. |
sup.cex |
a numeric value indicating the size of the supplementary symbols. Default is 2. |
active.legend.cex |
a numeric value indicating the size of active symbols in legend. Default is 2. |
sup.legend.cex |
a numeric value indicating the size of supplementary symbols in legend. Default is 2. |
active.lwd |
a numeric value indicating the width of active symbols. Default is 1. |
sup.lwd |
a numeric value indicating the width of supplementary symbols. Default is 3. |
active.legend.lwd |
a numeric value indicating the width of active symbols in legend. Default is 1. |
sup.legend.lwd |
a numeric value indicating the width of supplementary objects in legend. Default is 4. |
legend |
a boolean indicating whether the legend should be displayed (TRUE) or not (FALSE). Default is TRUE. |
active.legend.pos |
a string indicating the position of the legend for active elements. Default is "topleft". |
sup.legend.pos |
a string indicating the position of the legend for supplementary elements. Default is "topright". |
group.name |
a string vector indicating the names of the used groups. NULL vector allow to use all groups present in the 'mmds' object (x) and in all 'mmds.project' object. It corresponds either to the user-provided groups if the |
ensemble.legend.name |
a string vector indicating the names of the |
group.col |
a string vector indicating the colors of appearance of the used groups. This vector must have the same length of the group.name vector. NULL vector associate automaticaly the color of groups in 'mmds' and 'mmds.project' object to the group in the group.name parameter. It corresponds either to the user-provided colors if the |
outfile.width |
a numeric value in inches indicating the width of graph outfile. Default differs by outfile.type. See |
outfile.height |
a numeric value in inches indicating the height of graph outfile. Default differs by outfile.type. The resolution for tiff and png figures is 150 dpi. See |
box.lwd |
a numeric value indicating the border width of graph box and legend box. Default is 1. |
cex.axis |
a numeric value indicating the character size for the x and y axes. Default is 1. |
sup.legend.text |
a numeric value indicating the character size of the supplementary tag in legend. Default is 1. |
active.legend.text |
a numeric value indicating the character size of active tag in legend. Default is 1. |
legend.axis |
a boolean indicating whether axis name should be displayed (TRUE) or not (FALSE). Default is TRUE. |
grid |
a boolean indicating whether grid should be displayed (TRUE) or not (FALSE). Default is TRUE. |
axes |
a boolean indicating whether x and y axes should be displayed (TRUE) or not (FALSE). Default is TRUE. |
If mmds.2D.plot
is used after the col.group
function,
the elements are colored by the color scheme provided in the .csv file (see col.group
for details).
If the col.group
function has not been used, the default colors are black and magenta for active and supplementary elements.
mmds.2D.plot
helps identify patterns in data and compare active and supplementary elements.
active.alpha
argument is helpful for visualization of supplementary elements because it allows the symbols of supplementary elements to be in the foreground as compared to active elements.
Produces a scatter plot on the active graphical device.
mmds.2D.multi
requires alpha
function from scales
package.
Jean-Michel Becu
plot.PCA
function from FactoMineR
package.png
, pdf
, postscript
functions (default R package).
# scatter plot of human GPCRs onto the first two axes obtained # from MDS analysis with projection of GPCRs from N.vectensis # and C.intestinalis as supplementary elements: data(gpcr) mmds_human <- gpcr$mmds$sapiens.active project_vectensis <-gpcr$mmds$vectensis.project project_intestinalis <-gpcr$mmds$intestinalis.project mmds.2D.multi(mmds_human,project=list(project_vectensis,project_intestinalis), bary='l',cex.axis=0.01,active.cex = 1, sup.cex = 1,active.lwd=1.5,sup.lwd=3, ensemble.legend.name=c('nemve','inte','human'),legend=FALSE,title='multi_human') # with selected group mmds.2D.multi(mmds_human,project=list(project_vectensis,project_intestinalis), bary='l',cex.axis=0.01,active.cex = 1, sup.cex = 1,active.lwd=1.5,sup.lwd=3, ensemble.legend.name=c('nemve','inte','human'),legend=FALSE,title='multi_human', group.name=c('SO','PEP','OPN','ADENO'),group.col=c("red","forestgreen","orange","maroon"))
# scatter plot of human GPCRs onto the first two axes obtained # from MDS analysis with projection of GPCRs from N.vectensis # and C.intestinalis as supplementary elements: data(gpcr) mmds_human <- gpcr$mmds$sapiens.active project_vectensis <-gpcr$mmds$vectensis.project project_intestinalis <-gpcr$mmds$intestinalis.project mmds.2D.multi(mmds_human,project=list(project_vectensis,project_intestinalis), bary='l',cex.axis=0.01,active.cex = 1, sup.cex = 1,active.lwd=1.5,sup.lwd=3, ensemble.legend.name=c('nemve','inte','human'),legend=FALSE,title='multi_human') # with selected group mmds.2D.multi(mmds_human,project=list(project_vectensis,project_intestinalis), bary='l',cex.axis=0.01,active.cex = 1, sup.cex = 1,active.lwd=1.5,sup.lwd=3, ensemble.legend.name=c('nemve','inte','human'),legend=FALSE,title='multi_human', group.name=c('SO','PEP','OPN','ADENO'),group.col=c("red","forestgreen","orange","maroon"))
Displays a scatter plot of the active elements and, if present, of supplementary elements, after a metric MDS analysis.
mmds.2D.plot(x,project = NULL, title = NULL, axis = c(1, 2), xlim = NULL, ylim = NULL, outfile.type = NULL, outfile.name = "mmds",new.plot = TRUE, active.col = x$col[,3], active.alpha = 1, sup.col = project$col[,3], active.pch = 20, sup.pch = 3, active.lab = FALSE, sup.lab = FALSE, active.cex = 2, sup.cex = 2, active.legend.cex = 2, sup.legend.cex = 2, active.legend.lwd = 1, sup.legend.lwd = 2, active.lwd = 1, sup.lwd = 4, legend = TRUE, active.legend.pos = "bottomleft", sup.legend.pos = "bottomright", active.legend.name = x$group[,1], sup.legend.name = project$group[,1], active.legend.col = x$group[,2], sup.legend.col = project$group[,2], outfile.width = NULL, outfile.height = NULL, box.lwd = 1, cex.axis = 1, cex.lab = 1, sup.legend.text = 1, active.legend.text = 1, legend.axis = TRUE, grid = TRUE, axes = TRUE)
mmds.2D.plot(x,project = NULL, title = NULL, axis = c(1, 2), xlim = NULL, ylim = NULL, outfile.type = NULL, outfile.name = "mmds",new.plot = TRUE, active.col = x$col[,3], active.alpha = 1, sup.col = project$col[,3], active.pch = 20, sup.pch = 3, active.lab = FALSE, sup.lab = FALSE, active.cex = 2, sup.cex = 2, active.legend.cex = 2, sup.legend.cex = 2, active.legend.lwd = 1, sup.legend.lwd = 2, active.lwd = 1, sup.lwd = 4, legend = TRUE, active.legend.pos = "bottomleft", sup.legend.pos = "bottomright", active.legend.name = x$group[,1], sup.legend.name = project$group[,1], active.legend.col = x$group[,2], sup.legend.col = project$group[,2], outfile.width = NULL, outfile.height = NULL, box.lwd = 1, cex.axis = 1, cex.lab = 1, sup.legend.text = 1, active.legend.text = 1, legend.axis = TRUE, grid = TRUE, axes = TRUE)
x |
an object of class 'mmds', obtained from |
project |
an object of class 'project', obtained from |
title |
a string of characters representing the title of the plot. Default is "Metric MDS". |
axis |
a numeric vector of length two representing the principal components displayed on the plot. Default is c(1, 2). |
xlim |
a numeric vector representing the range for the x values. Default is full range. |
ylim |
a numeric vector representing the range for the y values. Default is full range. |
outfile.type |
a string indicating the extension type of the graph outfile. Default is NULL. If not NULL, this should be one of "pdf", "tiff", "png" or "postscript". In this case, the parameter outfile.name and the forthcoming parameters are activated. |
outfile.name |
a string ("mmds", default) indicating the name and directory of pdf graph outfile. The extension file is added automaticaly. See |
new.plot |
a boolean indicating whether a new graphical device should be created (TRUE) or not (FALSE). Default is TRUE. |
active.pch |
an integer indicating the symbol of active elements. Default is 20, corresponding to dots. |
sup.pch |
an integer indicating the symbol of supplementary elements. Default is 3, corresponding to crosses. |
active.col |
a string of characters or character vector representing the color(s) of the active elements. Default is x$col[,3].
It corresponds either to the user-provided colors if the |
active.alpha |
a numeric value indicating the alpha value for opacity of active objects. This value must range from 0 (invisible) to 1 (full opacity). Default is 1. |
sup.col |
a string or character vector representing the color(s) of supplementary elements. Default is project$col[,3].
It corresponds either to the user-provided colors if the |
active.lab |
a boolean indicating whether labels of active elements should be displayed (TRUE) or not (FALSE). Default is FALSE. |
sup.lab |
a boolean indicating whether labels of supplementary elements should be displayed (TRUE) or not (FALSE). Default is FALSE. |
active.cex |
a numeric value indicating the size of the active symbols. Default is 2. |
sup.cex |
a numeric value indicating the size of the supplementary symbols. Default is 2. |
active.legend.cex |
a numeric value indicating the size of active symbols in legend. Default is 2. |
sup.legend.cex |
a numeric value indicating the size of supplementary symbols in legend. Default is 2. |
active.lwd |
a numeric value indicating the width of active symbols. Default is 1. |
sup.lwd |
a numeric value indicating the width of supplementary symbols. Default is 4. |
active.legend.lwd |
a numeric value indicating the width of active symbols in legend. Default is 1. |
sup.legend.lwd |
a numeric value indicating the width of supplementary objects in legend. Default is 4. |
legend |
a boolean indicating whether the legend should be displayed (TRUE) or not (FALSE). Default is TRUE. |
active.legend.pos |
a string indicating the position of the legend for active elements. Default is "topleft". |
sup.legend.pos |
a string indicating the position of the legend for supplementary elements. Default is "topright". |
active.legend.name |
a string vector indicating the names of the |
sup.legend.name |
a string vector indicating the names of the |
active.legend.col |
a string vector indicating the colors of the different |
sup.legend.col |
a string vector indicating the colors of the different |
outfile.width |
a numeric value in inches indicating the width of graph outfile. Default differs by outfile.type. See |
outfile.height |
a numeric value in inches indicating the height of graph outfile. Default differs by outfile.type. The resolution for tiff and png figures is 150 dpi. See |
box.lwd |
a numeric value indicating the border width of graph box and legend box. Default is 1. |
cex.axis |
a numeric value indicating the character size for the x and y axes. Default is 1. |
cex.lab |
a numeric value indicating the character size for the labels of the x and y axes. Default is 1. |
sup.legend.text |
a numeric value indicating the character size of the supplementary tag in legend. Default is 1. |
active.legend.text |
a numeric value indicating the character size of active tag in legend. Default is 1. |
legend.axis |
a boolean indicating whether axis name should be displayed (TRUE) or not (FALSE). Default is TRUE. |
grid |
a boolean indicating whether grid should be displayed (TRUE) or not (FALSE). Default is TRUE. |
axes |
a boolean indicating whether x and y axes should be displayed (TRUE) or not (FALSE). Default is TRUE. |
If mmds.2D.plot
is used after the col.group
function,
the elements are colored by the color scheme provided in the .csv file (see col.group
for details).
If the col.group
function has not been used, the default colors are black and magenta for active and supplementary elements.
mmds.2D.plot
helps identify patterns in data and compare active and supplementary elements.
active.alpha
argument is helpful for visualization of supplementary elements because it allows the symbols of supplementary elements to be in the foreground as compared to active elements.
Produces a scatter plot on the active graphical device.
mmds.2D.plot
requires alpha
function from scales
package.
Julien Pele and Jean-Michel Becu
plot.PCA
function from FactoMineR
package.png
, pdf
, postscript
functions (default R package).
# scatter plot of human GPCRs onto the first two axes obtained from MDS analysis # with projection of GPCRs from D. melanogaster as supplementary elements: data(gpcr) active <- gpcr$dif$sapiens.sapiens mmds_active <- mmds(active,group.file=system.file( "csv/human_gpcr_group.csv",package = "bios2mds")) mmds.2D.plot(mmds_active, active.alpha = 0.5, active.lab = TRUE) # with group information sup <- gpcr$dif$melanogaster.sapiens mmds_sup <- mmds.project(mmds_active, sup,group.file=system.file( "csv/drome_gpcr_group.csv",package = "bios2mds")) mmds.2D.plot(mmds_active,mmds_sup)
# scatter plot of human GPCRs onto the first two axes obtained from MDS analysis # with projection of GPCRs from D. melanogaster as supplementary elements: data(gpcr) active <- gpcr$dif$sapiens.sapiens mmds_active <- mmds(active,group.file=system.file( "csv/human_gpcr_group.csv",package = "bios2mds")) mmds.2D.plot(mmds_active, active.alpha = 0.5, active.lab = TRUE) # with group information sup <- gpcr$dif$melanogaster.sapiens mmds_sup <- mmds.project(mmds_active, sup,group.file=system.file( "csv/drome_gpcr_group.csv",package = "bios2mds")) mmds.2D.plot(mmds_active,mmds_sup)
Displays a 3D plot of the active elements and, if present, of supplementary elements, after a metric MDS analysis.
mmds.3D.plot(x, project = NULL, title = NULL, axis = c(1:3), active.type = "s", sup.type = "p", active.size = 2, radius = 0.005, sup.size = 10, active.col = x$col[,3], sup.col = project$col[,3], box = TRUE, axes = TRUE, new.plot = TRUE, label = TRUE, xlim = NULL, ylim = NULL, zlim = NULL, box.lwd = 2, box.antialias = TRUE, ...)
mmds.3D.plot(x, project = NULL, title = NULL, axis = c(1:3), active.type = "s", sup.type = "p", active.size = 2, radius = 0.005, sup.size = 10, active.col = x$col[,3], sup.col = project$col[,3], box = TRUE, axes = TRUE, new.plot = TRUE, label = TRUE, xlim = NULL, ylim = NULL, zlim = NULL, box.lwd = 2, box.antialias = TRUE, ...)
x |
an object of class 'mmds', obtained from |
title |
a string of characters representing the title of the plot. Default is "Metric MDS". |
project |
an object of class 'project', obtained from |
axis |
a numeric vector of length three representing the principal components displayed on the plot. Default is c(1:3). |
active.type |
an character indicating the symbol of active elements. This should be one of "s" for spheres, "p" for points, "l" for lines, "h" for line segments from z=0 and "n" for none. Default is "s", corresponding to spheres. |
sup.type |
an character indicating the symbol of supplementary elements. This should be one of "s" for spheres, "p" for points, "l" for lines, "h" for line segments from z=0 and "n" for none. Default is "p", corresponding to points. |
active.col |
a string of characters or character vector representing the color(s) of the active elements. Default is x$col[,3].
It corresponds either to the user-provided colors if the |
sup.col |
a string or character vector representing the color(s) of supplementary elements. Default is project$col[,3].
It corresponds either to the user-provided colors if the |
active.size |
a numeric value indicating the size of active symbols. Default is 2. |
sup.size |
a numeric value indicating the size of supplementary symbols. Default is 20. |
box |
a boolean indicating whether the box should be displayed (TRUE) or not (FALSE). Default is TRUE. |
axes |
a boolean indicating whether axes should be displayed (TRUE) or not (FALSE). Default is TRUE. |
radius |
a numeric value indicating the radius of spheres symbols only. If x.type equal to "s" the x.size parameter was inactivated. Default is 0.01. |
new.plot |
a boolean indicating whether a new 3D plot create/replace active 3D device (TRUE) or not to insert in it (FALSE). Default is TRUE. |
label |
a boolean indicating whether the label axes should be displayed (TRUE) or not (FALSE). Default is TRUE. |
xlim |
a numeric vector representing the range for the x values. Default is full range. |
ylim |
a numeric vector representing the range for the y values. Default is full range. |
zlim |
a numeric vector representing the range for the z values. Default is full range. |
box.lwd |
a numeric value indicating the width of box and axes lines. Default is 2. |
box.antialias |
a boolean specifying if box and axes lines should be antiliased. Default is TRUE. |
... |
additional parameters which will be passed to |
If mmds.3D.plot
is used after the col.group
function,
the elements are colored by the color scheme provided in the .csv file (see col.group
for details).
If the col.group
function has not been used, the default colors are black and magenta for active and supplementary elements.
mmds.3D.plot
helps identify patterns in data and compare active and supplementary elements.
Produces a 3D plot on graphical device.
mmds.3D.plot
requires plot3D
function from rgl
package.
See rgl
documentation to supplementary function like snapshot3D
to save image file of 3D device.
Jean-Michel Becu
# 3D plot of human GPCRs onto the first three axes obained from MDS analysis # with projection of GPCRs from D. melanogaster as supplementary elements: data(gpcr) mmds.3D.plot(gpcr$mmds$sapiens.active,active.type="p",label=FALSE,lit=FALSE, point_antialias=TRUE,box.lwd=3,sup.size=4.3,active.size=4.3) bbox3d(shininess=0.5)
# 3D plot of human GPCRs onto the first three axes obained from MDS analysis # with projection of GPCRs from D. melanogaster as supplementary elements: data(gpcr) mmds.3D.plot(gpcr$mmds$sapiens.active,active.type="p",label=FALSE,lit=FALSE, point_antialias=TRUE,box.lwd=3,sup.size=4.3,active.size=4.3) bbox3d(shininess=0.5)
Displays one scree plot and three scatter plots of mmds
results.
mmds.plot(x, project = NULL, new.plot = TRUE, pdf.file = NULL)
mmds.plot(x, project = NULL, new.plot = TRUE, pdf.file = NULL)
x |
an object of class 'mmds', obtained from |
project |
an object of class 'project', obtained from |
new.plot |
a boolean indicating whether a new graphical device should be created (TRUE) or not (FALSE). Default is TRUE. |
pdf.file |
a string indicating the name and directory of the pdf graph outfile. Default is NULL. If this parameter is not
NULL, the parameter |
mmds.plot
is a wrapper calling of both scree.plot
and mmds.2D.plot
.
It produces a 2x2 plot with one scree plot of the relative eigenvalues, in the upper left, and three scatter plots. The three scatter plots are generated as follows:
scatter plot of the elements on the first and second components in the upper right.
scatter plot of the elements on the first and third components in the lower left.
scatter plot of the elements on the second and third components in the lower right.
If object x
contains supplementary elements, they are also projected onto the three scatter plots.
The active and supplementary elements are represented by dots and crosses, respectively.
The color.group
function may be used before calling mmds.plot
to color elements by user-provided groups.
Produces a summary plot of the MDS analysis on the same active graphical device.
The scatter plots can display supplementary objects if their coordinates are present in x
input.
Julien Pele and Jean-Michel Becu
plot.pca
function in bio3d
package.
# summary plot of the MDS analysis of human GPCRs with projection of GPCRs # from D. melanogaster as supplementary elements: data(gpcr) mmds.plot(gpcr$mmds$sapiens.active,gpcr$mmds$melanogaster.project)
# summary plot of the MDS analysis of human GPCRs with projection of GPCRs # from D. melanogaster as supplementary elements: data(gpcr) mmds.plot(gpcr$mmds$sapiens.active,gpcr$mmds$melanogaster.project)
Performs metric MultiDimensional Scaling (MDS) analysis of active elements and projects supplementary elements onto the active space defined by active elements.
mmds.project(mmds, sup, pc = 3, group.file = NULL)
mmds.project(mmds, sup, pc = 3, group.file = NULL)
mmds |
an object of class 'mmds', obtained from |
sup |
a numeric matrix of distances between supplementary and active elements. |
pc |
a numeric value indicating the number of principal components to be saved. Default is 3. |
group.file |
a string of characters to indicate the file name assigning groups and colors to each supplementary element of the sup matrix. |
Metric multidimensional scaling is a statistical analysis technique aimed at analyzing a matrix of distances between 'active' elements. MDS maps the elements onto a low dimensional space (usually 2D or 3D). In this space, elements are represented by points whose respective distances best approximate the initial distance.
In addition, after the metric MDS analysis of active elements, the mmds.project
function allows projecting supplementary elements onto the active space in the context of the R environment. The active space is defined only by the MDS analysis of the active elements. The position of supplementary elements onto the active space depends only on their distances to active elements.
sup
must have some characteristics:
sup
represents the matrix of pairwise distances between supplementary (rows) and active (columns) elements and does not have to be symmetric. The number of supplementary elements may be lower or higher than the number of active objects. The names of supplementary elements must be placed in the left column of sup
.
The method for the computation of metric MDS projection of supplementary data is described by Abdi (2007). Briefly, if N is the number of active sequences and D is the N by N matrix of the squared distances computed from the active matrix, the mmds
function performs the following steps:
Transforms D into a cross-product matrix S:
where I is the N by N identity matrix, m is the N by 1 matrix mass, where each mass equal to and 1 is an N by N matrix of ones.
Transforms S into a factor matrix F:
where M is the diag{m}, U is the eigenvector matrix and is the diagonal matrix of the eigenvalues, such as S =
, where
denotes the transposition operation.
The eigenvectors of S, also called principal components (whose number is smaller or equal to N), form the active space. F gives the coordinates of the active elements in this space.
The supplementary elements are projected onto the active space as described below.
If is the number of supplementary sequences and if
is the
by N
matrix of the squared distances, the
mmds.project
function performs the following steps :
Transforms into a cross-product matrix
:
where is an
by N matrix of ones.
Transforms into a factor matrix
:
gives the coordinates of the supplementary elements in the active space.
A object of class 'project', which is a named list of three elements:
coord |
a named numeric matrix representing the coordinates of active elements. |
group |
a named string matrix representing the differents groups of elements and the associate color (Default is 'NoGroup' and 'black'). |
col |
a named string matrix representing, foreach named elements, the associate group and color (Default is 'NoGroup' and 'black). |
If sup
do not contain names:
A tag "S" followed by an incremented number names the rows of sup
.
The columns of sup
are named as the rows of mmds$D
.
Julien Pele and Jean-Michel Becu
Abdi H (2007) Metric multidimensional scaling. In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. pp. 598-605.
For further reading on projection of supplementary elements:
Gower JC (1968) adding a point to vector diagrams in multivariate analysis. Biometrika 55:582-585.
Trosset MW and Pribe CE (2008) The out-of-sample problem for classical multidimensional scaling. Computational statistics & Data analysis 52:4635-4642.
Pele J, Abdi H, Moreau M, Thybert D and Chabbert M (2011) Multidimensional scaling reveals the main evolutionary pathways of class A G-protein-coupled receptors. PloS ONE 6:e19094.
cmdscale
function from stats
package.dudi.pco
, suprow
and supcol
functions from ade4
package.PCA
function from FactoMineR
package.
# performing metric MDS of human GPCRs with projection of # GPCRs from D. melanogaster as supplementary elements: data(gpcr) active <- gpcr$dif$sapiens.sapiens sup <- gpcr$dif$melanogaster.sapiens mmds_active<-mmds(active) mmds_sup <- mmds.project(mmds_active,sup)
# performing metric MDS of human GPCRs with projection of # GPCRs from D. melanogaster as supplementary elements: data(gpcr) active <- gpcr$dif$sapiens.sapiens sup <- gpcr$dif$melanogaster.sapiens mmds_active<-mmds(active) mmds_sup <- mmds.project(mmds_active,sup)
Builds a multiple sequence alignment (MSA) of random sequences.
random.msa(nb.seq = 100, id = "SEQ", nb.pos = 100, gap = FALSE, aa.strict = FALSE, align = NULL, align.replace = TRUE)
random.msa(nb.seq = 100, id = "SEQ", nb.pos = 100, gap = FALSE, aa.strict = FALSE, align = NULL, align.replace = TRUE)
nb.seq |
a numeric value indicating the number of sequences in the random MSA. Default is 100. |
id |
a string of characters used to tag each sequence name. Default is "SEQ". An incremented number is attached to this tag to name each sequence. |
nb.pos |
a numeric value indicating the length of each sequence in the random MSA. Default is 100. |
gap |
a logical value indicating whether the gap character should be considered as a supplementary symbol (TRUE) or not (FALSE). Default is FALSE. |
aa.strict |
a logical value indicating whether only strict amino acids should be taken into account (TRUE) or not (FALSE). Default is FALSE. |
align |
an object of class 'align', obtained from |
align.replace |
a logical value indicating random drawing with replacement (TRUE) or without replacement (FALSE) of characters present in |
random.msa
may be used to compare a reference MSA to a random MSA.
The random MSA must have the same characteristics as the reference MSA (same number of sequences of same length).
A mmds
procedure can be applied to the random MSA to assess the amount of variance due to
random mutations in the reference MSA.
A named list whose objects correspond to random sequences.
The subset
function is used for random selection of the amino acids. If a truly random
procedure is needed, see random
package.
Julien Pele
For an application of random MSA see :
Pele J, Abdi H, Moreau M, Thybert D and Chabbert M (2011) Multidimensional scaling reveals the main evolutionary pathways of class A G-protein-coupled receptors. PLoS ONE 6: e19094. doi:10.1371.
permutation
and synsequence
functions from seqinr
package.
# generating a random sequence alignment with the same characterics # as human GPCRs: aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) nb.seq <- length(aln) nb.pos <- length(aln[[1]]) aln.random <- random.msa(nb.seq = nb.seq, nb.pos = nb.pos)
# generating a random sequence alignment with the same characterics # as human GPCRs: aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds")) nb.seq <- length(aln) nb.pos <- length(aln[[1]]) aln.random <- random.msa(nb.seq = nb.seq, nb.pos = nb.pos)
Displays a bar plot of the eigenvalues obtained by MDS
scree.plot(x, lab = FALSE, title = NULL, xlim = NULL, ylim = NULL, new.plot = TRUE, pdf.file = NULL)
scree.plot(x, lab = FALSE, title = NULL, xlim = NULL, ylim = NULL, new.plot = TRUE, pdf.file = NULL)
x |
a numeric vector representing the raw or the relative eigenvalues of each component. |
lab |
a boolean indicating whether bar labels should be displayed (TRUE) or not (FALSE). Default is FALSE. |
title |
a character string representing the title of the plot. Default is "Scree plot". |
xlim |
a numeric vector representing the range of the x values. Default is full range. |
ylim |
a numeric vector representing the range of the y values. Default is full range. |
new.plot |
a boolean indicating whether a new graphical device should be created (TRUE) or not (FALSE). Default is TRUE. |
pdf.file |
a string indicating the name and directory of the pdf graph outfile. Default is NULL. If this parameter is not NULL, the parameter |
A scree plot is a method for determining the optimal number of components useful to describe the data in the context of metric MultiDimensional Scaling (MDS). The scree plot is an histogram showing the eigenvalues of each component. The relative eigenvalues express the ratio of each eigenvalue to the sum of the eigenvalues. The relative eigenvalue of a component gives the proportion of the data variance explained by this component.
The aim is to evaluate the number of components required to capture most information contained in the data. In a scree plot, the relative eigenvalues decrease when the component number increases. The 'elbow' of the plot determines the optimal number of components to describe the data (usually the components before the 'elbow').
Produces a bar plot on the active graphical device.
The scree plot is not an exclusive method to determine the optimal number of components.
A shepard plot, which is a scatterplot of the scaled MDS distances against the original distance data, can be
another solution. See shepard
function from MASS
package.
Julien Pele
plot.pca.scree
function from bio3d
package.goodness.metaMDS
function from vegan
package.
# displaying the scree plot of the MDS analysis of human GPCRs data(gpcr) active <- gpcr$dif$sapiens.sapiens mmds1 <- mmds(active, pc = 5) scree.plot(mmds1$eigen.perc, lab = TRUE, title = "Scree plot of metric MDS")
# displaying the scree plot of the MDS analysis of human GPCRs data(gpcr) active <- gpcr$dif$sapiens.sapiens mmds1 <- mmds(active, pc = 5) scree.plot(mmds1$eigen.perc, lab = TRUE, title = "Scree plot of metric MDS")
Computes silhouette scores for multiple runs of K-means clustering.
sil.score(mat, nb.clus = c(2:13), nb.run = 100, iter.max = 1000, method = "euclidean")
sil.score(mat, nb.clus = c(2:13), nb.run = 100, iter.max = 1000, method = "euclidean")
mat |
a numeric matrix representing the coordinates of the elements. |
nb.clus |
a numeric vector indicating the range of the numbers of clusters. Default is c(2:13). |
nb.run |
a numeric value indicating the number of runs. Default is 100. |
iter.max |
a numeric value indicating the maximum number of iterations for K-means clustering. Default is 1000. |
method |
a string of characters to determine the distance measure. This should be one of "euclidean" , "maximum", "manhattan", "canberra" or "binary". Default is "euclidean". |
Silhouettes are a general graphical aid for interpretation and validation of cluster analysis.
This technique is available through the silhouette
function (cluster
package). In order to
calculate silhouettes, two types of data are needed:
the collection of all distances between objects. These distances are obtained from
application of dist
function on the coordinates of the elements in mat
with argument method
.
the partition obtained by the application of a clustering technique. In sil.score
context, the partition is obtained from the Kmeans
function (amap
package) with argument
method
which indicates the cluster to which each element is assigned.
For each element, a silhouette value is calculated and evaluates the degree of confidence in the assignment of the element:
well-clustered elements have a score near 1,
poorly-clustered elements have a score near -1.
Thus, silhouettes indicates the objects that are well or poorly clustered. To summarize the results, for each cluster, the silhouettes values can be displayed as an average silhouette width, which is the mean of silhouettes for all the elements assigned to this cluster. Finally, the overall average silhouette width is the mean of average silhouette widths of the different clusters.
Silhouette values offer the advantage that they depend only on the partition of the elements. As a consequence, silhouettes can be used to compare the output of the same clustering algorithm applied
to the same data but for different numbers of clusters. A range of numbers of clusters can be tested, with the nb.clus
argument. The optimal number of clusters is reached for the maximum of the overall
silhouette width. This means that the clustering algorithm reaches a strong clustering structure.
However, for a given number of clusters, the cluster assignment obtained by different K-means runs can be different because the K-means procedure assigns random initial centroids for each run. It may be necessary to run the K-means procedure several times, with the nb.run argument, to evaluate the uncertainty of the results. In that case, for each number of clusters, the mean of the overall average silhouettes for nb.run
K-means runs is calculated. The maximum of this core gives the optimal number of clusters.
A named numeric vector representing the silhouette scores for each number of clusters.
sil.score
requires Kmeans
and silhouette
functions from amap
and
cluster
packages, respectively.
Julien Pele
Rousseeuw PJ (1987) Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. Journal of Computational and Applied Mathematics, 20:53-65.
Lovmar L, Ahlford A, Jonsson M and Syvanen AC (2005) Silhouette scores for assessment of SNP genotype clusters. BMC Genomics, 6:35.
Guy B, Vasyl P, Susmita D and Somnath D (2008) clValid: An R Package for Cluster Validation. Journal of Statistical Software, 25.
connectivity
and dunn
functions from clValid
package.silhouette
function from cluster
package.
# calculating silhouette scores for K-means clustering of human GPCRs: data(gpcr) active <- gpcr$dif$sapiens.sapiens mds <- mmds(active) sil.score1 <- sil.score(mds$coord, nb.clus = c(2:10), nb.run = 100, iter.max = 100) barplot(sil.score1)
# calculating silhouette scores for K-means clustering of human GPCRs: data(gpcr) active <- gpcr$dif$sapiens.sapiens mds <- mmds(active) sil.score1 <- sil.score(mds$coord, nb.clus = c(2:10), nb.run = 100, iter.max = 100) barplot(sil.score1)
Contains eight amino acid substitution matrices, imported from version 9.1 of the aaindex2 database and PAM matrix calculator of Wageningen Bioinformatics Webportal.
data(sub.mat)
data(sub.mat)
A named list with eight elements corresponding to a 20 by 20 named matrix. Rows and columns names correspond to the twenty strict amino acids.
matrix was produced by "pam" Version 1.0.7
matrix was produced by "pam" Version 1.0.7
matrix was produced by "pam" Version 1.0.7
log odds matrix for 250 PAMs (Dayhoff et al., 1978)
substitution matrix (Henikoff-Henikoff, 1992)
substitution matrix (Henikoff-Henikoff, 1992)
substitution matrix (Henikoff-Henikoff, 1992)
substitution matrix (Henikoff-Henikoff, 1992)
substitution matrix (GONNET et al., 1992)
substitution matrix (Jones et al., 1992)
transmembrane protein exchange matrix (Jones et al., 1994)
substitution matrix built from hydrophobic and transmembrane regions of the Blocks database (Ng et al., 2000)
The matrices were downloaded from the AAindex database at http://www.genome.jp/aaindex or were calculated on the PAM server at http://www.bioinformatics.nl/tools/pam.html.
Kawashima S and Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28:374.
Dayhoff MO, Schwartz R and Orcutt BC (1978) A model of Evolutionary Change in Proteins. Atlas of protein sequence and structure (volume 5, supplement 3 ed.). Nat. Biomed. Res. Found.. pp. 345-358.
Henikoff S and Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89:10915-9.
Jones DT, Taylor WR and Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl. Biosci 8:275-282.
Jones DT, Taylor WR and Thornton JM (1994) A mutation data matrix for transmembrane proteins. FEBS Lett 339:269-75.
Ng PC, Henikoff JG and Henikoff S (2000) PHAT: a transmembrane-specific substitution matrix. Predicted hydrophobic and transmembrane. Bioinformatics 16:760-6.
Gonnet GH, Cohen MA and Benner SA (1992) Exhaustive matching of the entire protein sequence database. Science 256:1443-1445.
# loading data(sub.mat) # displaying PAM40: sub.mat$PAM40
# loading data(sub.mat) # displaying PAM40: sub.mat$PAM40
Writes MDS coordinates in the Protein Data Bank format for visualization with a molecular graphics viewer.
write.mmds.pdb(x,project = NULL, axis = c(1, 2, 3), file.pdb = "R.pdb", file.pml=NULL)
write.mmds.pdb(x,project = NULL, axis = c(1, 2, 3), file.pdb = "R.pdb", file.pml=NULL)
x |
an object of class 'mmds', obtained from |
project |
an object of class 'project', obtained from |
axis |
a numeric vector of length three the principal components to be displayed. Default is c(1, 2, 3). |
file.pdb |
a string of characters indicating the output PDB file name. Default is "R.pdb". |
file.pml |
a string of characters indicating the output pml file name for visualization with Pymol. If this parameter is not NULL, the pml file will be written. Default is NULL. |
The elements can be visualized in three dimensions (3D) with a molecular viewer as Pymol or Rasmol.
If x
contains active and supplementary elements, the active and supplementary elements are numbered from 1 and from 5001, respectively. If group is not NULL, the assignment of an element to a group is indicated by the chain name from A for the first group to Z when the maximum number of groups, 26, is reached.
The pml file allows a fancy visualization of the PDB file with the Pymol molecular viewer. The user must first open the PDB file with Pymol, then run the pml file. The active and inactive elements will be displayed as spheres and crosses, respectively, with coloring based on the user-provided colors with the col.group
function.
Produces a PDB file from the MDS coordinates, with the elements numbered in the order of the MSA file and the groups corresponding to the chain numbers. Optionnaly, produces a pml file to add color and group selection in pymol with the pdb file.
Julien Pele and Jean-Michel Becu
http://www.wwpdb.org/docs.html
write.pdb
function from bio3d
package.
# writing the first three MDS coordinates of human GPCRs in a PDB file wd <- tempdir() #wd <- getwd() file1 <- file.path(wd,"sapiens.pdb") file2 <- file.path(wd,"sapiens.pml") data(gpcr) write.mmds.pdb(gpcr$mmds$sapiens.active,file.pdb=file1,file.pml=file2)
# writing the first three MDS coordinates of human GPCRs in a PDB file wd <- tempdir() #wd <- getwd() file1 <- file.path(wd,"sapiens.pdb") file2 <- file.path(wd,"sapiens.pml") data(gpcr) write.mmds.pdb(gpcr$mmds$sapiens.active,file.pdb=file1,file.pml=file2)