Title: | From Biological Sequences and Simulations to Correlation Analysis |
---|---|
Description: | Utilities for computation and analysis of correlation/covariation in multiple sequence alignments and in side chain motions during molecular dynamics simulations. Features include the computation of correlation/covariation scores using a variety of scoring functions between either sequence positions in alignments or side chain dihedral angles in molecular dynamics simulations and utilities to analyze the correlation/covariation matrix through a variety of tools including network representation and principal components analysis. In addition, several utility functions are based on the R graphical environment to provide friendly tools for help in data interpretation. Examples of sequence covariation analysis are provided in: (1) Pele J, Moreau M, Abdi H, Rodien P, Castel H, Chabbert M (2014) <doi:10.1002/prot.24570> and (2) Taddese B, Deniaud M, Garnier A, Tiss A, Guissouma H, Abdi H, Henrion D, Chabbert M (2018) <doi:10.1371/journal.pcbi.1006209>. An example of side chain correlated motion analysis is provided in: Taddese B, Garnier A, Abdi H, Henrion D, Chabbert M (2020) <doi:10.1038/s41598-020-72766-1>. This work was supported by the French National Research Agency (Grant number: ANR-11-BSV2-026) and by GENCI (Grant number: 100567). |
Authors: | Bruck Taddese [aut], Antoine Garnier [aut], Madeline Deniaud [aut], Julien Pele [ctb], Lea Bellenger [ctb], Jean-Michel Becu [ctb], Marie Chabbert [aut, cre] |
Maintainer: | Marie Chabbert <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.2.2 |
Built: | 2024-11-15 08:34:07 UTC |
Source: | CRAN |
The Bios2cor
package is dedicated to the computation and analysis of correlation/covariation between positions in multiple sequence alignments (MSA) and between side chain dihedral angles during molecular dynamics simulations (MD). Features include the computation of correlation/covariation scores using a variety of scoring functions and their analysis through a variety of tools including network representation and principal components analysis. In addition, several utility functions are based on the R graphical environment to provide friendly tools for help in data interpretation.
For clarity purpose, version 2 of the package differentiates scoring functions working on MSA and on MD because their arguments are different. Analysis functions are common with auto detection of MSA or MD.
The main functionalities of Bios2cor
are summarized below:
Methods that can be used to analyze sequence alignments and molecular simulations and to calculate a correlation/covariation matrix containing a score for each pair of positions (sequence alignment) or each pair of dihedral angles (molecular simulations).
Methods working with sequence alignments (fasta or msf file is required):
omes
: calculates the difference between the observed and expected occurrences of each possible pair of amino acids (x, y) at positions i and j of the alignment.
mi
and mip
: calculate a score based on the probability of joint occurrence of events (MI) and a corrected score by substraction of the average product (MIP), respectively.
elsc
: calculates a score based on rigorous statistics of correlation/covariation in a perturbation-based algorithm. It measures how many possible subsets of size n would have the composition found in column j.
mcbasc
: relies on a substitution matrix giving a similarity score for each pair of amino acids.
Methods working with molecular simulations (pdb and dcd files are required) :
dynamic_circular
: calculates a correlation/covariation score based on a circular version of the Pearson correlation coefficient, between each pair of side chain dihedral angles in a trajectory obtained from molecular dynamics simulations.
dynamic_omes
: calculates the difference between the observed and expected occurrences of each possible pair of rotamers (x, y) occuring at side chain dihedral angles i and j in a trajectory.
dynamic_mi
and dynamic_mip
: calculate a score based on the probability of joint occurrence of rotameric states (MI) and a corrected score by substraction of the average product (MIP), respectively.
The methods working with molecular simulations require the following functions :
dynamic_structure
: using the result of the xyz2torsion
function from the bio3D
package, creates a structure that contains side chain dihedral angle informations for each selected frame of the trajectory.
angle2rotamer
: using the result of the dynamic_structure
function, creates a structure that associates rotameric state to each side chain dihedral angle for each selected frame of the trajectory.
Functions that can be used to analyse the results of the correlation/covariation methods :
Entropy functions :
entropy
: calculates an entropy score for each position of the alignment. This score is based on sequence conservation and uses a formula derived from the Shannon's entropy.
dynamic_entropy
: calculates a "dynamic entropy" score for each side chain dihedral angle of a protein during molecular simulations. This score is based on the number of rotameric changes of the dihedral angle during the simulation.
Filters:
delta_filter
: given an entropy object, returns a delta filter for each position of the alignment or each side chain dihedral angle of the protein, based on entropy/dynamic entropy value.
PCA :
centered_pca
: returns a principal component analysis of the double-centered correlation/covariation matrix passed as a parameter. A delta filter can be precised.
Functions that can be used to produce output files.
Some data structures can be stored in txt/csv files :
write.scores
: Using the result of a correlation/covariation method, creates a file containing the score of each pair of positions (sequence alignment analysis) or of side chain dihedral angles (molecular simulations) and optionaly their entropy/dynamic_entropy score.
top_pairs_analysis
: Using the result of a correlation/covariation method and an integer N, creates two files containing (1) the top N pairs with their scores and (2) the individual elements, their contact counts and their entropy score for the top N pairs. Subsequently, these files can be used for network visualization with the Cytoscape program accessible at https://cytoscape.org.
write.pca
: Using the result of the centered_pca
function, creates a file that contains the coordinates of each element in the principal component space.
write.pca.pdb
: Using the result of the centered_pca
function, creates a pdb file with the PCA coordinates on three principal components along with a pml file for nice visualization with the Pymol molecular visualization program accessible at https://pymol.org.
Some data can be visualized in png/pdf files:
scores.boxplot
: Using the result of one or several correlation/covariation methods, creates a boxplot to visualize the distribution of the Z-scores.
network.plot
: Using the result of the top_pairs_analysis
function, creates the graph of a network representation of the data.
scores_entropy.plot
: Using the result of a correlation/covariation method and an entropy structure, creates a graph comparing correlation/covariation scores with entropy values. Each pair of elements (i,j) is placed in the graph with (entropy[i] ; entropy[j]) as coordinates. The color code of each point is based on its correlation/covariation score (red/pink color for top values, blue/skyblue for bottom values).
pca_screeplot
: Using the result of the centered_pca
function, creates the graph of the eigen values (positive values only).
pca_2d
: Using the result of the centered_pca
function, creates a graph with the projection of the elements on two selected components.
angles.plot
: Using pdb and dcd files and the result of a correlation/covariation method, creates graphs to monitor the time evolution of each dihedral angle in the top N pairs
Package: | BioCor |
Type: | Package |
Version: | 2.1 |
Date: | 2020-01-30 |
License: | GPL |
Bruck TADDESE [aut], Antoine Garnier [aut], Madeline DENIAUD [aut], Lea BELLANGER [ctb], Julien PELE[ctb], Jean-Michel BECU [ctb], Marie CHABBERT [cre]. Maintainer: Marie CHABBERT <[email protected]>
#File path for output files wd <- tempdir() #wd <- getwd() file <- file.path(wd,"test_seq1") #Importing multiple sequence alignment msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating correlation object with OMES method omes <- omes(align, gap_ratio = 0.2) #Creating entropy object entropy <- entropy(align, gap_ratio = 0.2) #Creating delta filter based on entropy filter <- delta_filter(entropy, Smin = 0.3, Smax = 0.8) #Selecting a correlation matrix omes <-omes$score # Creating PCA object for selected correlation matrix and storing eigen values in csv file pca <- centered_pca(omes, filepathroot= file, pc= NULL, dec_val= 5, filter= filter)
#File path for output files wd <- tempdir() #wd <- getwd() file <- file.path(wd,"test_seq1") #Importing multiple sequence alignment msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating correlation object with OMES method omes <- omes(align, gap_ratio = 0.2) #Creating entropy object entropy <- entropy(align, gap_ratio = 0.2) #Creating delta filter based on entropy filter <- delta_filter(entropy, Smin = 0.3, Smax = 0.8) #Selecting a correlation matrix omes <-omes$score # Creating PCA object for selected correlation matrix and storing eigen values in csv file pca <- centered_pca(omes, filepathroot= file, pc= NULL, dec_val= 5, filter= filter)
Given an object of class 'structure' and an angle to rotamer conversion file, associates a rotamer to each dihedral angle value. The object of class 'structure' contains dihedral angle values for each side chain dihedral angle and each frame of the trajectory. The conversion file is a reference file that contains the rotamer to be associated with a dihedral angle value, depending on the residue type and the dihedral angle considered. This function will allow to compute correlation/covariation scores between rotameric states.
angle2rotamer(dynamic_structure, conversion_file=system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor"))
angle2rotamer(dynamic_structure, conversion_file=system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor"))
dynamic_structure |
Dihedral angle structure, result of the |
conversion_file |
The file containing the rotamer to be associated with each residue dihedral angle depending of the dihedral angle value. Each line contains five fields, separated by ','. The five fields represent the amino acid name ("R", "N",...), the dihedral angle name("chi1", "chi2",...), the associated rotamer ("g+", "t", "g-"), the start and stop angles (between -180 and 180). For example, for the chi1 angle of the valine residue, a torsion angle between 0 and 120 is associated to rotameric state g+. Default is the "rotamer/dynameomics_rotamers.csv" conversion file provided with the Bios2cor package from the dynameomics database ( |
In the torsion object and in the conversion file, dihedral angle values vary between -180 and 180.
A character matrix containing the rotameric state of each side chain dihedral angle for each frame in the trajectory, depending on the dihedral angle value and on a conversion file.
Antoine GARNIER and Lea BELLENGER
Van der Kamp MW, Schaeffer RD, Jonsson AL, Scouras AD, Simms AM, Toofanny RD, Benson NC, Anderson PC, Merkley ED, Rysavy S, Bromley D, Beck DAC, and Daggett V. Dynameomics: A comprehensive database of protein dynamics. Structure, 18: 423-435, 2010.
#Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Reading conversion file #conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") #Creating dynamic_structure object wanted_frames <- seq(from = 1, to = 40, by = 2) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Creating rotamers object with default conversion_file rotamers <- angle2rotamer(dynamic_structure)
#Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Reading conversion file #conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") #Creating dynamic_structure object wanted_frames <- seq(from = 1, to = 40, by = 2) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Creating rotamers object with default conversion_file rotamers <- angle2rotamer(dynamic_structure)
Given an object of class 'structure' and names of dihedral angles, creates plots of the dihedral angles as a function of frames in a pdf file.
angles.plot(dynamic_structure, filepathroot, angles)
angles.plot(dynamic_structure, filepathroot, angles)
dynamic_structure |
Dihedral angle structure, result of the |
angles |
A vector containing the names of the dihedral angles to be visualized.Default is NULL (all the torsional angles of the dynamic_structure object are taken into account). |
filepathroot |
Root of the full path name for the output file. If NULL, an output file "ANGLES.pdf" is created in tempdir(). If not NULL, a "_ANGLES.pdf" extention is added to the filepathroot. |
The object of class 'structure' contains the side chain dihedral angles (between -180 and 180) for each residue in the protein, for each frame of the molecular simulations. This function allows visualisation of the evolution of selected angles.
returns a pdf file containing the plots of the frame dependance of each element included in argument angles
.
Antoine GARNIER and Marie CHABBERT
#Indicating file path for output files out <- tempdir() file <- file.path(out,"test_dyn1") #Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object for wanted frames wanted_frames <- seq(from= 1, to= 40, by= 2) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Calculating circular correlation between dihedral angles of selected residues wanted_residues <- c("H","N","Q","F","Y","W") dihed_corr <- dynamic_circular(dynamic_structure, wanted_residues) #Selecting a correlation matrix dihed_corr <- dihed_corr$Zscore #Selecting angles of interest (here from the "top_pairs_analysis" function) top_angles <- top_pairs_analysis(dihed_corr, top= 25, file) my_angles <- unlist(top_angles$positions) #Creating plots of the time evolution of the dihedral angles evol_angles <- angles.plot(dynamic_structure, file, my_angles)
#Indicating file path for output files out <- tempdir() file <- file.path(out,"test_dyn1") #Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object for wanted frames wanted_frames <- seq(from= 1, to= 40, by= 2) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Calculating circular correlation between dihedral angles of selected residues wanted_residues <- c("H","N","Q","F","Y","W") dihed_corr <- dynamic_circular(dynamic_structure, wanted_residues) #Selecting a correlation matrix dihed_corr <- dihed_corr$Zscore #Selecting angles of interest (here from the "top_pairs_analysis" function) top_angles <- top_pairs_analysis(dihed_corr, top= 25, file) my_angles <- unlist(top_angles$positions) #Creating plots of the time evolution of the dihedral angles evol_angles <- angles.plot(dynamic_structure, file, my_angles)
Given a correlation/covariation matrix, performs the principal component analysis of the centered matrix.
centered_pca(corr_matrix, filepathroot, filter = NULL, pc= NULL, dec_val= 5)
centered_pca(corr_matrix, filepathroot, filter = NULL, pc= NULL, dec_val= 5)
corr_matrix |
A score or Zscore matrix created by a correlation/covariation function ( |
filepathroot |
The root of the full path name for the csv and png files where eigen values are stored or displayed. By default, two "EIGEN.csv" and "EIGEN.png" files are created in temp(dir). If not NULL, the extensions "_EIGEN.csv" and "_EIGEN.png" are added to the filepathroot. |
filter |
A filter calculated by the |
pc |
A numeric value indicating the number of principal components to be saved. Default is NULL (all the principal components are saved). |
dec_val |
A numeric value corresponding to the precision when the round function is used. Default is 5. |
This function performs a principal component analysis of a correlation/covariation matrix after double centering. It is based on the matrix centering algorithm of the mmds.R
function from the Bios2mds
package. The elements have the same weight except when a delta filter is indicated. In this latter case, only the elements allowed by the delta filter are taken into account.
returns an object of class 'pca' which is a named list of four elements:
eigen |
a numeric vector of the eigenvalues |
eigen.perc |
a numeric vector of the relative eigenvalues (eigenvalues divided by the sum of the absolute eigenvalues) |
coord |
a numeric matrix representing the coordinates of each element of the correlation/covariation matrix in the PCA space |
source |
a named list with 2 elements, the correlation/covariation matrix ( |
returns also two files: a csv file containing eigen values and a png file displaying eigen values.
Antoine GARNIER and Marie CHABBERT
#File path for output files out <- tempdir() file <- file.path(out,"test_seq_pca") #Importing multiple sequence alignment msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating correlation object with OMES method omes <- omes(align, gap_ratio= 0.2) #Creating entropy object entropy <- entropy(align, gap_ratio=0.2) #Creating delta filter based on entropy filter <- delta_filter(entropy, Smin = 0.2, Smax = 0.6) #Selecting a correlation/covariation matrix matrix_omes <-omes$score #Creating PCA object for selected matrix and storing eigen values in csv file pca <- centered_pca(matrix_omes, filepathroot= file, filter = filter)
#File path for output files out <- tempdir() file <- file.path(out,"test_seq_pca") #Importing multiple sequence alignment msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating correlation object with OMES method omes <- omes(align, gap_ratio= 0.2) #Creating entropy object entropy <- entropy(align, gap_ratio=0.2) #Creating delta filter based on entropy filter <- delta_filter(entropy, Smin = 0.2, Smax = 0.6) #Selecting a correlation/covariation matrix matrix_omes <-omes$score #Creating PCA object for selected matrix and storing eigen values in csv file pca <- centered_pca(matrix_omes, filepathroot= file, filter = filter)
Given an entropy object (result of the entropy
or of the dynamic_entropy
function), creates a vector with a delta filter of each element based on the entropy value. The vector will be used to limit the analysis to the elements in the given entropy range in the centered_pca
and top_pairs_analysis
functions.
delta_filter(entropy, Smin = 0, Smax = 1)
delta_filter(entropy, Smin = 0, Smax = 1)
entropy |
An object created by the |
Smin |
A value indicating the minimum entropy value. (Smin = 0 by default) |
Smax |
A value indicating the maximum entropy value. (Smax = 1 by default) |
The object returned by the entropy
or the dynamic_entropy
function contains an entropy score for each element.
The delta weighting of each element is calculated as follow :
A vector that contains a 0 or 1 weighting score for each element (position in sequence alignment or side chain dihedral angle in trajectory) to limit principal component and top pair analysis to elements within a given entropy range.
Antoine GARNIER
#Importing MSA align <- import.msf(system.file("msa/toy_align.msf", package = "Bios2cor")) #Creating entropy object entropy <- entropy(align) #Creating delta filter based on entropy filter <- delta_filter(entropy, Smin = 0.4, Smax = 0.6)
#Importing MSA align <- import.msf(system.file("msa/toy_align.msf", package = "Bios2cor")) #Creating entropy object entropy <- entropy(align) #Creating delta filter based on entropy filter <- delta_filter(entropy, Smin = 0.4, Smax = 0.6)
Calculates circular correlation/covariation scores between side chain dihedral angles during a molecular dynamics trajectory.
dynamic_circular( dynamic_structure, res_selection= c("C","I","L","M","V","R","H","K","D","E","N","Q","F","Y","W","T","S","P") )
dynamic_circular( dynamic_structure, res_selection= c("C","I","L","M","V","R","H","K","D","E","N","Q","F","Y","W","T","S","P") )
dynamic_structure |
Object of class 'structure' that is created by the |
res_selection |
List of amino acids that will be taken into account in the correlation/covariation matrix. By default, all the amino acids are taken into account except Gly and Ala, with no side chain dihedral angles. |
This function uses the cor.circular
function from the circular
package based on a circular version of the Pearson coeefficient.
returns a list of four elements which are numeric matrices containing (1) the correlation/covariation scores for each pair of rotamers (score), (2) the Z-scores for each pair of rotamers (Zscore), (3) the correlation/covariation scores for each pair of rotamers with zero values for autocorrelation (correlation within the same side chain) (score_noauto) and (4) the Z-scores calculated without autocorrelation pairs and zero values for autocorrelation pairs (Zscore_noauto).
Bruck TADESSE, Antoine GARNIER, and Marie CHABBERT
Circular Statistics, from “Topics in circular Statistics” (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.
#Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object for selected frames wanted_frames <- seq(from = 1, to = 40, by = 2) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Computing circular correlation between dihedral angles of selected residues res_selection <- c("H","N","Q","F","Y","W") dihed_corr <- dynamic_circular(dynamic_structure, res_selection)
#Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object for selected frames wanted_frames <- seq(from = 1, to = 40, by = 2) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Computing circular correlation between dihedral angles of selected residues res_selection <- c("H","N","Q","F","Y","W") dihed_corr <- dynamic_circular(dynamic_structure, res_selection)
Measures a "dynamic entropy" or variability score of each dihedral angle based on the number of rotameric changes during the molecular dynamics trajectory.
dynamic_entropy(rotamers)
dynamic_entropy(rotamers)
rotamers |
A character matrix of type 'rotamers' that is produced by the |
The "dynamic entropy" score S is computed by summing the number of rotameric changes over all frames, normalized to the number of frames. It is not a "true entropy" score but gives usefull information on variability of the dihedral angle during the MD simulation.
A numeric vector containing a "dynamic entropy" score for each side chain dihedral angle during the trajectory. The score is comprised between 0 (no change in the rotameric state during the trajectory) and 1 (rotameric change for every frame of the trajectory).
Antoine GARNIER, Lea BELLENGER and Marie CHABBERT
#Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Reading conversion file conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") #Creating the dynamic_structure and rotamers objects wanted_frames <- seq(from = 1, t = 40, by = 5) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) rotamers <- angle2rotamer(dynamic_structure, conversion_file) #creating the dynamic_entropy object dynamic_entropy <- dynamic_entropy(rotamers)
#Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Reading conversion file conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") #Creating the dynamic_structure and rotamers objects wanted_frames <- seq(from = 1, t = 40, by = 5) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) rotamers <- angle2rotamer(dynamic_structure, conversion_file) #creating the dynamic_entropy object dynamic_entropy <- dynamic_entropy(rotamers)
Calculates a mutual information score (MI) based on the probability of joint occurrence of events.
dynamic_mi( dynamic_structure, rotamers, res_selection= c("C","I","L","M","V","R","H","K","D","E","N","Q","F","Y","W","T","S","P") )
dynamic_mi( dynamic_structure, rotamers, res_selection= c("C","I","L","M","V","R","H","K","D","E","N","Q","F","Y","W","T","S","P") )
dynamic_structure |
An object of class 'structure' that is created by the |
rotamers |
A character matrix of type 'rotamers' that is produced by the |
res_selection |
List of amino acids that will be taken into account in the correlation/covariation matrix. By default, all the amino acids are taken into account except Gly and Ala, with no side chain dihedral angles. |
The MI score at position [i,j] has been computed with the following formula :
where is the frequency of the rotamer pair (x,y) at dihedral angles i and j.
returns a list of four elements which are numeric matrices containing (1) the correlation/covariation scores for each pair of rotamers (score), (2) the Z-scores for each pair of rotamers (Zscore), (3) the correlation/covariation scores for each pair of rotamers with zero values for autocorrelation (correlation within the same side chain) (score_noauto) and (4) the Z-scores calculated without autocorrelation pairs and zero values for autocorrelation pairs (Zscore_noauto).
Antoine GARNIER and Marie CHABBERT
Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinfor;atics 2008;24:333-340. Martin LC, Gloor GB, Dunn SD, Wahl LM. Using infor;ation theory to search for co-evolving residues in proteins. Bioinformatics 2005;21:4116-4124.
#Reading pdb and dcd files pdb <- system.file("rotamer/tiny_toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/tiny_toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object wanted_frames <- seq(from = 5, to = 40, by = 15) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Creating rotamers object using conversion_file conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") rotamers <- angle2rotamer(dynamic_structure, conversion_file) #Creating correlation object for selected residues using MI method wanted_residues <- c("H","N") mi_corr <- dynamic_mi(dynamic_structure, rotamers, wanted_residues)
#Reading pdb and dcd files pdb <- system.file("rotamer/tiny_toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/tiny_toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object wanted_frames <- seq(from = 5, to = 40, by = 15) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Creating rotamers object using conversion_file conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") rotamers <- angle2rotamer(dynamic_structure, conversion_file) #Creating correlation object for selected residues using MI method wanted_residues <- c("H","N") mi_corr <- dynamic_mi(dynamic_structure, rotamers, wanted_residues)
Calculates a corrected mutual information score (MIP), by substraction of the average product from the probability of joint occurrence of events.
dynamic_mip( dynamic_structure, rotamers, res_selection= c("C","I","L","M","V","R","H","K","D","E","N","Q","F","Y","W","T","S","P") )
dynamic_mip( dynamic_structure, rotamers, res_selection= c("C","I","L","M","V","R","H","K","D","E","N","Q","F","Y","W","T","S","P") )
dynamic_structure |
An object of class 'structure' that is created by the |
rotamers |
A character matrix of type 'rotamers' that is produced by the |
res_selection |
List of amino acids that will be taken into account in the correlation/covariation matrix. By default, all the amino acids are taken into account except Gly and Ala, with no side chain dihedral angles. |
The MIP score at position [i,j] has been computed with the following formula :
with :
and where is the frequency of the rotamer pair (x,y) at dihedral angles i and j.
returns a list of four elements which are numeric matrices containing (1) the correlation/covariation scores for each pair of rotamers (score), (2) the Z-scores for each pair of rotamers (Zscore), (3) the correlation/covariation scores for each pair of rotamers with zero values for autocorrelation (correlation within the same side chain) (score_noauto) and (4) the Z-scores calculated without autocorrelation pairs and zero values for autocorrelation pairs (Zscore_noauto).
Antoine GARNIER and Marie CHABBERT
Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinfor;atics 2008;24:333-340. Martin LC, Gloor GB, Dunn SD, Wahl LM. Using infor;ation theory to search for co-evolving residues in proteins. Bioinformatics 2005;21:4116-4124.
#Reading pdb and dcd files pdb <- system.file("rotamer/tiny_toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/tiny_toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object wanted_frames <- seq(from = 5, to = 40, by = 15) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Creating rotamers object using conversion_file conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") rotamers <- angle2rotamer(dynamic_structure, conversion_file) #Creating correlation object for selected residues with MIP method wanted_residues <- c("H","N") mip_corr <- dynamic_mip(dynamic_structure, rotamers, wanted_residues)
#Reading pdb and dcd files pdb <- system.file("rotamer/tiny_toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/tiny_toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object wanted_frames <- seq(from = 5, to = 40, by = 15) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Creating rotamers object using conversion_file conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") rotamers <- angle2rotamer(dynamic_structure, conversion_file) #Creating correlation object for selected residues with MIP method wanted_residues <- c("H","N") mip_corr <- dynamic_mip(dynamic_structure, rotamers, wanted_residues)
Calculates difference between observed and expected occurrences of each possible pair of rotamers (x, y) for i and j dihedral angles over all frames
dynamic_omes( dynamic_structure, rotamers, res_selection= c("C","I","L","M","V","R","H","K","D","E","N","Q","F","Y","W","T","S","P") )
dynamic_omes( dynamic_structure, rotamers, res_selection= c("C","I","L","M","V","R","H","K","D","E","N","Q","F","Y","W","T","S","P") )
dynamic_structure |
An object of class 'structure' that is created by the |
rotamers |
A character matrix that is produced by the angle2rotamer function. The matrix indicates the rotameric state of each side chain dihedral angle for each frame of the trajectory. |
res_selection |
List of amino acids that will be taken into account in the correlation/covariation matrix. By default, all the amino acids are taken into account except Gly and Ala, with no side chain dihedral angles. |
The OMES score for angles [i,j] has been computed with the following formula :
with :
where :
is the number of times that each (x,y) rotamer pair is observed at angles i and j
is the number of times that each (x,y) rotamer pair would be expected, based on the frequency of rotamer x and y at angles i and j
is the number of frames
is the frequency of rotamer x at angle i
is the frequency of rotamer y at angle j
returns a list of four elements which are numeric matrices containing (1) the correlation/covariation scores for each pair of rotamers (score), (2) the Z-scores for each pair of rotamers (Zscore), (3) the correlation/covariation scores for each pair of rotamers with zero values for autocorrelation (correlation within the same side chain) (score_noauto) and (4) the Z-scores calculated without autocorrelation pairs and zero values for autocorrelation pairs (Zscore_noauto).
Antoine GARNIER, Lea BELLENGER, and Marie CHABBERT
Fodor AA and Aldrich RW. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins. 2004;56(2):211-21.
#Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object wanted_frames <- seq(from = 5, to = 40, by = 10) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Creating rotamers object using conversion_file conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") rotamers <- angle2rotamer(dynamic_structure, conversion_file) #Creating correlation object for selected residues with OMES method wanted_residues <- c("W") omes_corr <- dynamic_omes(dynamic_structure, rotamers, wanted_residues)
#Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object wanted_frames <- seq(from = 5, to = 40, by = 10) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Creating rotamers object using conversion_file conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") rotamers <- angle2rotamer(dynamic_structure, conversion_file) #Creating correlation object for selected residues with OMES method wanted_residues <- c("W") omes_corr <- dynamic_omes(dynamic_structure, rotamers, wanted_residues)
Given a structure pdb file, a trajectory dcd file and frame indices, gathers information on side chain dihedral angles in a unique structure. This structure will be used in correlation/covariation methods aimed at analyzing side chain rotational motions during molecular dynamics simulations.
dynamic_structure(pdb, trj, frames=NULL)
dynamic_structure(pdb, trj, frames=NULL)
pdb |
Filepath of the pdb file |
trj |
Filepath of trajectory file (dcd file) |
frames |
Indicates the selected frames for the analysis, created with the |
Returns a list of class 'structure' with the following elements containing information on the sequence, structure, trajectory and side chain dihedral angles (values and names) of the protein during the simulation:
pdb |
an object of class 'pdb' created by the |
dcd |
A numeric matrix of xyz coordinates with a frame per row and a Cartesian coordinate per column. Created by the |
xyz |
A numeric matrix of xyz coordinates with a frame per row and a Cartesian coordinate per column. For each frame, the protein coordinates have been fitted on the pdb structure using the |
tor |
A numeric matrix of side chain dihedral angles with a frame per row and a dihedral angle per column. Contains only side chain dihedral angles. Created by the |
tor.names |
a character vector with the names of all side chain chi dihedral angles. They are written as "M.chiN" where M is the residue number and N the dihedral angle chi (chi1, chi2,...). Alanine and Glycine residues which do not have side chain dihedral angles are omitted. |
tor.resno |
a character vector with the residue number M of all side chain chi dihedral angles. |
tor.angle |
a character vector with the dihedral angle (chiN) of all side chain chi dihedral angles. |
nb_torsions |
a numeric value indicating the total number of dihedral angles |
prot.seq |
a character vector with the sequence of the protein |
nb_residues |
a numeric value indicating the number of residues in the protein |
nb_frames |
a numeric value indicating the total number of selected frames |
frames |
a numeric vector indicating the selected frames |
Bruck TADDESE, Antoine GARNIER and Marie CHABBERT
#Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object for selected frames wanted_frames <- seq(from = 1, to = 40, by = 2) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames)
#Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object for selected frames wanted_frames <- seq(from = 1, to = 40, by = 2) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames)
Calculates a score based on rigorous statistics of correlation/covariation in a perturbation-based algorithm. It measures how many possible subsets of size n would have the composition found in column j in the subset alignment defined by the perturbation in column i, and in the ideal subset (i.e., in a subset with the amino acid distribution equal to the total alignment).
elsc(align, gap_ratio = 0.2)
elsc(align, gap_ratio = 0.2)
align |
An object of class 'align' created by the |
gap_ratio |
Numeric value between 0 and 1 indicating the maximal gap ratio at a given position in the MSA for this position to be taken into account. Default is 0.2, positions with more than 20 percent of gaps will not be taken into account in the analysis. When gap_ratio is 1 or close to 1, only positions with at least 1 aa are taken into account (positions with only gaps are excluded). |
The ELSC score at position [i,j] has been computed with the following formula :
As a reminder, a binomial coefficient is computed as follow :
where :
is the number of residues y at position j in the total (unperturbed) sequence alignment
is the number of residues y at position j in the subset alignment defined by the perturbation in column i
is the number of residues y at position j in the ideal subset (i.e., in a subset with the amino acid distribution equal to the total alignment)
A list of two elements which are numeric matrices containing the ELSC scores and Z-scores for each pair of elements.
Madeline DENIAUD and Marie CHABBERT
Dekker JP, Fodor A, Aldrich RW, Yellen G. A perturbation-bqsed method for calculating explicit likelihood of evolutionary covariance in multiple sequence alignements. Bioinformatics 2004;20:1565-1572.
#Importing MSA file align <- import.msf(system.file("msa/toy_align.msf", package = "Bios2cor")) #Creating correlation object with ELSC method for positions with gap ratio < 0.1 elsc <- elsc(align, gap_ratio = 0.1)
#Importing MSA file align <- import.msf(system.file("msa/toy_align.msf", package = "Bios2cor")) #Creating correlation object with ELSC method for positions with gap ratio < 0.1 elsc <- elsc(align, gap_ratio = 0.1)
Measures the entropy score of each position in a sequence alignment
entropy(align, gap_ratio=0.2)
entropy(align, gap_ratio=0.2)
align |
An object created by the |
gap_ratio |
Numeric value between 0 and 1 indicating the maximal gap ratio at a given position in the MSA for this position to be taken into account. 1 is excluded (positions with gaps only). Default is 0.2, positions with more than 20 percent of gaps will not be taken into account in the analysis. When gap_ratio is 1 or close to 1, only positions with at least 1 aa are taken into account (positions with only gaps are excluded). |
The entropy score S at position i has been computed with a formula derived from the Shannon's entropy as follow :
where :
i is the position in the sequence
x is the sequence index
represents the frequency of residue x at position i
A vector containing an entropy value for each position in the alignment
Antoine GARNIER and Marie CHABBERT
Shannon CE. A mathematical theory of communication. Bell Syst Techn J 1948;27:379-423.
#Importing MSA file align <- import.msf(system.file("msa/human_gpcr.msf", package = "Bios2cor")) #creating entropy object for positions with gap ratio < 0.5 entropy <- entropy(align,gap_ratio=0.5)
#Importing MSA file align <- import.msf(system.file("msa/human_gpcr.msf", package = "Bios2cor")) #creating entropy object for positions with gap ratio < 0.5 entropy <- entropy(align,gap_ratio=0.5)
Reads a Multiple Sequence Alignment (MSA) file in FASTA format (.fasta or .fa extension).
import.fasta(file, aa.to.upper = TRUE, gap.to.dash = TRUE)
import.fasta(file, aa.to.upper = TRUE, gap.to.dash = TRUE)
file |
a string of characters to indicate the name of the MSA file to be read. |
aa.to.upper |
a logical value indicating whether amino acids should be converted to upper case (TRUE) or not (FALSE). Default is TRUE. |
gap.to.dash |
a logical value indicating whether the dot (.) and tilde ( |
Initially, FASTA (for FAST-ALL) was the input format of the FASTA program, used for protein comparison and searching in databases. Presently, FASTA format is a standard format for biological sequences.
The FASTA formatted file of a single sequence displays:
a single-line description beginning with a greater-than (>) symbol. The following word is the identifier.
followed by any number of lines, representing biological sequence.
For multiple alignments, the FASTA formatted sequences are concatenated to create a multiple FASTA format.
A object of class 'align', which is a named list whose elements correspond to sequences, in the form of character vectors.
The import.fasta
function was developed for the bios2mds
R package (Julien PELE [aut], Jean-Michel BECU [aut], Marie CHABBERT [cre]).
.
Julien PELE
Pearson WR and Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 27:2444-2448.
# Importing MSA file in FASTA format aln <- import.fasta(system.file("msa/toy2_align.fa", package = "Bios2cor"))
# Importing MSA file in FASTA format aln <- import.fasta(system.file("msa/toy2_align.fa", package = "Bios2cor"))
Reads a Multiple Sequence Alignment (MSA) file in MSF format (.msf extension).
import.msf(file, aa.to.upper = TRUE, gap.to.dash = TRUE)
import.msf(file, aa.to.upper = TRUE, gap.to.dash = TRUE)
file |
a string of characters to indicate the name of the MSA file to be read. |
aa.to.upper |
a logical value indicating whether amino acids should be converted to upper case (TRUE) or not (FALSE). Default is TRUE. |
gap.to.dash |
a logical value indicating whether the dot (.) and tilde ( |
Initially, Multiple Sequence Format (MSF) was the multiple sequence alignment format of the Wisconsin Package (WP) or GCG (Genetic Computer Group). This package is a suite of over 130 sequence analysis programs for database searching, secondary structure prediction or sequence alignment. Presently, numerous multiple sequence alignment editors (Jalview and GeneDoc for example) can read and write MSF files.
MSF file displays several specificities:
a header containing sequence identifiers and characteristics (length, check and weight).
a separator symbolized by 2 slashes (//).
sequences of identifiers, displayed by consecutive blocks.
A object of class 'align', which is a named list whose elements correspond to sequences, in the form of character vectors.
The import.msf
function was developed for the bios2mds
R package (Julien PELE [aut], Jean-Michel BECU [aut], Marie CHABBERT [cre]).
It checks the presence of duplicated identifiers in header. Sequences whose identifiers are missing in header are ignored.
Julien PELE
read.alignment
function from seqinr
package.read.GDoc
function from aaMI
package (archived).
#Importing MSA file in MSF format aln <- import.msf(system.file("msa/toy_align.msf", package = "Bios2cor"))
#Importing MSA file in MSF format aln <- import.msf(system.file("msa/toy_align.msf", package = "Bios2cor"))
Calculates a score for each pair of residus in the sequenec alignment. It relies on a substitution matrix giving a similarity score for each pair of amino acids.
mcbasc(align, gap_ratio= 0.2)
mcbasc(align, gap_ratio= 0.2)
align |
An object of class 'align' created by the |
gap_ratio |
Numeric value between 0 and 1 indicating the maximal gap ratio at a given position in the MSA for this position to be taken into account. Default is 0.2, positions with more than 20 percent of gaps will not be taken into account in the analysis. When gap_ratio is 1 or close to 1, only positions with at least 1 aa are taken into account (positions with only gaps are excluded). |
The McBASC score at position [i,j] has been computed with a formula which was initially proposed by Valencia and coworkers(1) as follow :
where :
is the score for the amino acid pair present in sequences k and l at position i
is the score for the amino acid pair present in sequences k and l at position j
is the average of all the scores
is the average of all the scores
is the standard deviation of all the scores
is the standard deviation of all the scores
A list of two elements which are numeric matrices containing the mcbasc scores and Z-scores for each pair of elements.
Madeline DENIAUD and Marie CHABBERT
(1) Gobel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins 1994;18:309-317.
#Importing MSA file align <- import.msf(system.file("msa/toy_align.msf", package = "Bios2cor")) #Creating correlation object with McBASC method for positions with gap_ratio < 0.2 (Default) mcbasc <- mcbasc(align, gap_ratio = 0.2)
#Importing MSA file align <- import.msf(system.file("msa/toy_align.msf", package = "Bios2cor")) #Creating correlation object with McBASC method for positions with gap_ratio < 0.2 (Default) mcbasc <- mcbasc(align, gap_ratio = 0.2)
Calculates a mutual information score (MI) based on the probability of joint occurrence of events.
mi(align, gap_ratio= 0.2)
mi(align, gap_ratio= 0.2)
align |
An object of class 'align' created by the |
gap_ratio |
Numeric value between 0 and 1 indicating the maximal gap ratio at a given position in the MSA for this position to be taken into account. Default is 0.2, positions with more than 20 percent of gaps will not be taken into account in the analysis. When gap_ratio is 1 or close to 1, only positions with at least 1 aa are taken into account (positions with only gaps are excluded). |
The MI score at position [i,j] has been computed with the following formula :
and where is the frequency of the amino acid pair (x,y) at positions i and j.
N.B. this formula has been widely applied in the field of sequence correlation/covariation but favors pairs with high entropy.
A list of two elements which are numeric matrices containing the MI scores and Z-scores for each pair of elements.
Madeline DENIAUD and Marie CHABBERT
Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinfor;atics 2008;24:333-340. Martin LC, Gloor GB, Dunn SD, Wahl LM. Using infor;ation theory to search for co-evolving residues in proteins. Bioinformatics 2005;21:4116-4124.
#Importing MSA file align <- import.fasta(system.file("msa/toy2_align.fa", package = "Bios2cor")) #Creating correlation object with MI method for positions with gap ratio < 0.2 (Default) mi <- mi(align)
#Importing MSA file align <- import.fasta(system.file("msa/toy2_align.fa", package = "Bios2cor")) #Creating correlation object with MI method for positions with gap ratio < 0.2 (Default) mi <- mi(align)
Calculates a corrected mutual information score (MIP), by substraction of the average product from the probability of joint occurrence of events.
mip(align, gap_ratio = 0.2)
mip(align, gap_ratio = 0.2)
align |
An object of class 'align' created by the |
gap_ratio |
Numeric value between 0 and 1 indicating the maximal gap ratio at a given position in the MSA for this position to be taken into account. Default is 0.2, positions with more than 20 percent of gaps will not be taken into account in the analysis. When gap_ratio is 1 or close to 1, only positions with at least 1 aa are taken into account (positions with only gaps are excluded). |
The MIp score at position [i,j] has been computed with the following formula :
with :
and where is the frequency of the amino acid pair (x,y) at positions i and j.
N.B. this formula has been widely applied in the field of sequence correlation/covariation but favors pairs with high entropy.
A list of two elements which are numerical matrices containing the MIP scores and Z-scores for each pair of elements.
Madeline DENIAUD and Marie CHABBERT
Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinfor;atics 2008;24:333-340. Martin LC, Gloor GB, Dunn SD, Wahl LM. Using infor;ation theory to search for co-evolving residues in proteins. Bioinformatics 2005;21:4116-4124.
#Importing MSA file align <- import.fasta(system.file("msa/toy2_align.fa", package = "Bios2cor")) #Creating correlation object with MIP method for positions with gap ratio < 0.2 (Default) mip <- mip(align)
#Importing MSA file align <- import.fasta(system.file("msa/toy2_align.fa", package = "Bios2cor")) #Creating correlation object with MIP method for positions with gap ratio < 0.2 (Default) mip <- mip(align)
Given a top_pairs object (result of the top_pairs_analysis
function), creates a network to visualize the elements involved in the top scoring pairs and their links
network.plot(top_pairs, filepathroot)
network.plot(top_pairs, filepathroot)
top_pairs |
An object of class 'top_pairs' created by the |
filepathroot |
The root of the full path name for the output file. If NULL, a "NETWORK.pdf" file will be created in tempdir(). If not NULL, the filepathroot will have the "_NETWORK.pdf" extension. |
A network representing links between elements in the top scoring pairs
Antoine GARNIER
#File path for output file out <- tempdir() file <- file.path(out,"test_omes") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating OMES correlation object for positions with gap ratio < 0.2 omes <- omes(align, gap_ratio= 0.2) #Selecting a correlation matrix omes <-omes$Zscore #Analyzing pairs with top scores and creating 'top_pairs' object top_pairs <- top_pairs_analysis(omes, top = 25, file) #Plotting the network structure of top pairs in pdf file network.plot(top_pairs, file)
#File path for output file out <- tempdir() file <- file.path(out,"test_omes") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating OMES correlation object for positions with gap ratio < 0.2 omes <- omes(align, gap_ratio= 0.2) #Selecting a correlation matrix omes <-omes$Zscore #Analyzing pairs with top scores and creating 'top_pairs' object top_pairs <- top_pairs_analysis(omes, top = 25, file) #Plotting the network structure of top pairs in pdf file network.plot(top_pairs, file)
Calculates the difference between the observed and expected occurrences of each possible pair of amino acids (x, y) at positions i and j of the alignment
omes(align, gap_ratio = 0.2)
omes(align, gap_ratio = 0.2)
align |
An object of class 'align' created by the |
gap_ratio |
Numeric value between 0 and 1 indicating the maximal gap ratio at a given position in the MSA for this position to be taken into account. Default is 0.2, positions with more than 20 percent of gaps will not be taken into account in the analysis. When gap_ratio is 1 or close to 1, only positions with at least 1 aa are taken into account (positions with only gaps are excluded). |
The OMES score at position [i,j] has been computed with the following formula :
with :
where :
is number of times that each (x,y) pair is observed at positions i and j
is number of times that each (x,y) pair would be expected, based on the frequency of residues x and y at positions i and j
is the number of sequences in the alignment with non-gapped residues at positions i and j
is the frequency of amino acid x at position i
is the frequency of amino acid y at position j
A list of two elements which are numerical matrices containing the OMES scores and Z-scores for each pair of elements.
Jean-Miche BECU, Madeline DENIAUD, and Marie CHABBERT
Fodor AA and Aldrich RW. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins. 2004;56(2):211-21.
#Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating correlation object with OMES method for positions with gap ratio < 0.2 (Default) omes <- omes(align, gap_ratio = 0.2)
#Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating correlation object with OMES method for positions with gap ratio < 0.2 (Default) omes <- omes(align, gap_ratio = 0.2)
Given an object of class 'pca' (result of the centered_pca
function), draws a graph of the projection of the elements on two dimensions (first and second components by default)
pca_2d(pca_struct, abs= 1, ord= 2, filepathroot)
pca_2d(pca_struct, abs= 1, ord= 2, filepathroot)
pca_struct |
An object created by the |
abs |
An integer which corresponds to the x axis of the projection plane. Default is 1 (first component) |
ord |
An integer which corresponds to the y axis of the projection plane. Default is 2 (second component) |
filepathroot |
The root of the full path name for the output file. If NULL, a "PCA_abs_ord.png" file will be created in tempdir(). If not NULL, the filepathroot will have the "_PCA_abs_ord.png" extension. |
A 2D graph of the projection of the elements on selected two principal components
Antoine GARNIER
#File path for output files out <- tempdir() file <- file.path(out,"test_seq2") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating OMES correlation object omes <- omes(align, gap_ratio = 0.2) #Creating entropy object #entropy <- entropy(align) #Selecting correlation matrix omes <-omes$Zscore #Creating PCA object for selected correlation matrix and saving eigen values pca <- centered_pca(omes, filepathroot= file, pc= NULL, dec_val= 5, filter= NULL) #Creating 2D plot of elements projected on selected [abs,ord] plane pca_2d(pca, abs = 1, ord = 3, file)
#File path for output files out <- tempdir() file <- file.path(out,"test_seq2") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating OMES correlation object omes <- omes(align, gap_ratio = 0.2) #Creating entropy object #entropy <- entropy(align) #Selecting correlation matrix omes <-omes$Zscore #Creating PCA object for selected correlation matrix and saving eigen values pca <- centered_pca(omes, filepathroot= file, pc= NULL, dec_val= 5, filter= NULL) #Creating 2D plot of elements projected on selected [abs,ord] plane pca_2d(pca, abs = 1, ord = 3, file)
Given a PCA structure (result of the centered_pca
function), creates a screeplot of the positive eigen values
pca_screeplot(pca_struct, filepathroot)
pca_screeplot(pca_struct, filepathroot)
pca_struct |
An object created by the |
filepathroot |
The root of the full path name for the output file. If NULL, a "EIGEN.png" file will be created in tempdir(). If not NULL, the filepathroot will have the "_EIGEN.png" extension. |
A screeplot of positive eigen values
Antoine GARNIER
#File path for output files out <- tempdir() file <- file.path(out,"test_seq3") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating OMES correlation object omes <- omes(align, gap_ratio = 0.2) #Selecting correlation matrix omes <-omes$Zscore #Creating PCA object for selected correlation matrix and saving eigen values pca <- centered_pca(omes, filepathroot= file, pc= NULL, dec_val= 5, filter = NULL) #Plotting scree plot pca_screeplot(pca, file)
#File path for output files out <- tempdir() file <- file.path(out,"test_seq3") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating OMES correlation object omes <- omes(align, gap_ratio = 0.2) #Selecting correlation matrix omes <-omes$Zscore #Creating PCA object for selected correlation matrix and saving eigen values pca <- centered_pca(omes, filepathroot= file, pc= NULL, dec_val= 5, filter = NULL) #Plotting scree plot pca_screeplot(pca, file)
Builds a multiple sequence alignment (MSA) of random sequences.
random.msa(nb.seq = 100, id = "SEQ", nb.pos = 100, gap = FALSE, aa.strict = FALSE, align = NULL, align.replace = TRUE)
random.msa(nb.seq = 100, id = "SEQ", nb.pos = 100, gap = FALSE, aa.strict = FALSE, align = NULL, align.replace = TRUE)
nb.seq |
a numeric value indicating the number of sequences in the random MSA. Default is 100. |
id |
a string of characters used to tag each sequence name. Default is "SEQ". An incremented number is attached to this tag to name each sequence. |
nb.pos |
a numeric value indicating the length of each sequence in the random MSA. Default is 100. |
gap |
a logical value indicating whether the gap character should be considered as a supplementary symbol (TRUE) or not (FALSE). Default is FALSE. |
aa.strict |
a logical value indicating whether only strict amino acids should be taken into account (TRUE) or not (FALSE). Default is FALSE. |
align |
an object of class 'align', obtained from |
align.replace |
a logical value indicating random drawing with replacement (TRUE) or without replacement (FALSE) of characters present in |
random.msa
may be used to compare a reference MSA to a random MSA. The random MSA must have the same characteristics as the reference MSA (same number of sequences of same length).
A procedure can be applied to the random MSA to assess the amount of variance due to random mutations in the reference MSA.
The subset
function is used for random selection of the amino acids. If a truly random procedure is needed, see random
package.
A named list whose objects correspond to random sequences.
This function has been initially developped in the bios2mds
R package (Julien PELE [aut], Marie CHABBERT [cre]).
Julien PELE
For an application of random MSA see :
Pele J, Abdi H, Moreau M, Thybert D and Chabbert M (2011) Multidimensional scaling reveals the main evolutionary pathways of class A G-protein-coupled receptors. PLoS ONE 6: e19094. doi:10.1371.
permutation
and synsequence
functions from seqinr
package.
#Importing MSA file aln <- import.fasta(system.file("msa/toy2_align.fa", package = "Bios2cor")) #Generating a random sequence alignment with the same characterics as input file nb.seq <- length(aln) nb.pos <- length(aln[[1]]) aln.random <- random.msa(nb.seq = nb.seq, nb.pos = nb.pos)
#Importing MSA file aln <- import.fasta(system.file("msa/toy2_align.fa", package = "Bios2cor")) #Generating a random sequence alignment with the same characterics as input file nb.seq <- length(aln) nb.pos <- length(aln[[1]]) aln.random <- random.msa(nb.seq = nb.seq, nb.pos = nb.pos)
The aim of this graph is the visualization of the entropy values of the elements in top and bottom pairs.
scores_entropy.plot(entropy, corr_matrix, filepathroot, elite=25, high=275, filter=NULL)
scores_entropy.plot(entropy, corr_matrix, filepathroot, elite=25, high=275, filter=NULL)
entropy |
An object created by the |
corr_matrix |
A correlation/covariation matrix created by one of the correlation/covariation functions: |
filepathroot |
The root of the full path name of the graph that will be created. If NULL, a "ei_ej.pdf" file will be created in tempdir(). If not NULL, the filepathroot will have the "_ei_ej.pdf" extension. |
elite |
An integer to determine the number of pairs with the highest and lowest scores (e.g. 25: pairs ranked 1 to 25 in decreasing or increasing order) to be colored with the "elite" color codes. Default is 25. |
high |
An integer to determine the number of pairs with the next highest and lowest scores (e.g. 275: pairs ranked 26 to 275 in decreasing or increasing order) to be colored with the "high" color codes. Default is 275. |
filter |
A vector created by the |
Using the result of a correlation/covariation method and an entropy structure, creates a graph comparing correlation/covariation scores with entropy values. Each pair of elements (i,j) is placed in the graph with (entropy[i] ; entropy[j]) as coordinates. In the absence of filter, the color code of each point is based on its correlation/covariation score (dark and light blue for top elite and high values, red and pink for bottom elite and high values). In the presence of an entropy based filter, only top elite and high scores are visualized in dark and light blue, respectively,
A graph showing the entropy values of the elements in pairs with top and bottom entropy scores
Julien PELE, Antoine GARNIER and Marie CHABBERT
For an application of these graphs see :
Pele J, Abdi H, Moreau M, Thybert D and Chabbert M (2011) Multidimensional scaling reveals the main evolutionary pathways of class A G-protein-coupled receptors. PLoS ONE 6: e19094. doi:10.1371.
##Example with MSA #File path for output file out <- tempdir() file <- file.path(out,"test_seq7") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating OMES correlation object and selecting a correlation matrix correlation <- omes(align, gap_ratio = 0.2) corr_matrix <- correlation$Zscore #Creating entropy object entropy <- entropy(align) #Creating a delta filter based on entropy filter <- delta_filter(entropy, Smin = 0.4, Smax = 0.7) #Creating the entropy graph scores_entropy.plot(entropy, corr_matrix, filepathroot = file, filter=filter) #Example with MD #File path for output file wd <- tempdir() #wd <-getwd() file <- file.path(wd,"test_dyn7") #Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object for selected frames wanted_frames <- seq(from = 1, to = 40, by = 1) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Creating rotamers object using conversion_file conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") rotamers <- angle2rotamer(dynamic_structure, conversion_file) #Creating the dynamic_entropy and filter objects entropy <- dynamic_entropy(rotamers) filter <- delta_filter(entropy, Smin = 0.0, Smax = 0.1) #Creating correlation object #dyn_cor <- dynamic_circular(dynamic_structure) dyn_cor <- dynamic_omes(dynamic_structure,rotamers) #selection correlation matrix corr_matrix <- dyn_cor$Zscore_noauto #Creating the entropy graph scores_entropy.plot(entropy, corr_matrix, filepathroot = file, filter=filter)
##Example with MSA #File path for output file out <- tempdir() file <- file.path(out,"test_seq7") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating OMES correlation object and selecting a correlation matrix correlation <- omes(align, gap_ratio = 0.2) corr_matrix <- correlation$Zscore #Creating entropy object entropy <- entropy(align) #Creating a delta filter based on entropy filter <- delta_filter(entropy, Smin = 0.4, Smax = 0.7) #Creating the entropy graph scores_entropy.plot(entropy, corr_matrix, filepathroot = file, filter=filter) #Example with MD #File path for output file wd <- tempdir() #wd <-getwd() file <- file.path(wd,"test_dyn7") #Reading pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object for selected frames wanted_frames <- seq(from = 1, to = 40, by = 1) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Creating rotamers object using conversion_file conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") rotamers <- angle2rotamer(dynamic_structure, conversion_file) #Creating the dynamic_entropy and filter objects entropy <- dynamic_entropy(rotamers) filter <- delta_filter(entropy, Smin = 0.0, Smax = 0.1) #Creating correlation object #dyn_cor <- dynamic_circular(dynamic_structure) dyn_cor <- dynamic_omes(dynamic_structure,rotamers) #selection correlation matrix corr_matrix <- dyn_cor$Zscore_noauto #Creating the entropy graph scores_entropy.plot(entropy, corr_matrix, filepathroot = file, filter=filter)
Given a list of correlation/covariation matrices, build boxplots for comparative purposes.
scores.boxplot(corr_matrix_list, name_list, filepathroot, elite=25, high=275)
scores.boxplot(corr_matrix_list, name_list, filepathroot, elite=25, high=275)
corr_matrix_list |
A list of correlation/covariation matrices to be compared |
name_list |
The names of the correlation/covariation matrices |
filepathroot |
The root of the full path name for the output file. if NULL, a BOXPLOT.png file will be created in tempdir(). If not NULL, the "_BOXPLOT.png" extension is added to the filepathroot. |
elite |
An integer to determine the number of pairs with the highest and lowest scores (e.g. 25: pairs ranked 1 to 25 in decreasing or increasing order) to be colored with the "elite" color codes. Default is 25. |
high |
An integer to determine the number of pairs with the next highest and lowest scores (e.g. 275: pairs ranked 26 to 275 in decreasing or increasing order) to be colored with the "high" color codes. Default is 275. |
The correlation/covariation matrices contain the correlation/covariation scores for each pair of elements [i,j]. The boxplots will allow comparing these scores using color codes : the highest values are dark blue, the next highest values are light blue, the lowest values are red and the next lowest values are pink.
A pdf figure with boxplots to compare correlation/covariation scores
Julien PELE and Antoine GARNIER
For an application of these boxplots, see :
Pele J, Abdi H, Moreau M, Thybert D and Chabbert M (2011) Multidimensional scaling reveals the main evolutionary pathways of class A G-protein-coupled receptors. PLoS ONE 6: e19094. doi:10.1371.
#File path for output file out <- tempdir() file <- file.path(out,"test_seq") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating OMES correlation object omes <- omes(align, gap_ratio = 0.2) #Selecting correlation matrices omes <-omes$Zscore #Creating a list of matrices and plotting the boxplots in a graph #One matrix corr_matrix_list <- list(omes) name <- c("omes") scores.boxplot(corr_matrix_list, name, file, 25, 275)
#File path for output file out <- tempdir() file <- file.path(out,"test_seq") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating OMES correlation object omes <- omes(align, gap_ratio = 0.2) #Selecting correlation matrices omes <-omes$Zscore #Creating a list of matrices and plotting the boxplots in a graph #One matrix corr_matrix_list <- list(omes) name <- c("omes") scores.boxplot(corr_matrix_list, name, file, 25, 275)
Given a correlation object, calculates the number of pairs (contacts) each element in the top X pairs are involved in
top_pairs_analysis(corr_matrix, filepathroot, top=25, entropy=NULL, filter=NULL)
top_pairs_analysis(corr_matrix, filepathroot, top=25, entropy=NULL, filter=NULL)
corr_matrix |
One of the matrices created by a correlation/covariation function ( |
filepathroot |
The root of the full path names of the output files for top_pairs_analysis. If NULL, two csv files are created in tempdir(): TOPn_CONTACTS.csv and TOPn_SCORES.csv, where n is the number of top pairs). If not NULL, extentions "_TOPn_CONTACTS.csv" and "_TOPn_SCORES.csv" are added to the filepathroot. |
top |
A integer indicating the number of top pairs used for this analysis. Default is 25. |
entropy |
An object created by the entropy or dynamic_entropy function. Default is NULL. |
filter |
A vector created by the |
This function sorts element pairs by correlation/covariation scores and analyzes the top X pairs to determine the number of pairs (contacts) each element of the top X pairs is involved in. If filter is TRUE, only the scores of elements in the delta filter defined entropy range are taken into account. Results are written as .csv files.
returns an object of class 'top_pairs' which is a named list of four elements for subsequent network representation with the network.plot
function:
pair_i |
a vector containing the name of element i in the ordered top pairs |
pair_j |
a vector containing the name of element j in the ordered top pairs |
positions |
a vector containing the positions in the top n pairs |
contacts |
a vector containing the number of contacts of the positions in the top n pairs |
returns also two .csv files containing scores and contacts of the top n pairs for subsequent network representation with Cytoscape.
Antoine GARNIER and Marie CHABBERT
#File path for output files out <- tempdir() file <- file.path(out,"test_seq") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating entropy object entropy <- entropy(align) #Creating OMES correlation object omes <- omes(align, gap_ratio = 0.2) #Selecting correlation matrix omes <-omes$Zscore #Creating top_pairs object and writing scores and contacts to csv files top_pairs <- top_pairs_analysis(omes, file, top = 25, entropy=entropy)
#File path for output files out <- tempdir() file <- file.path(out,"test_seq") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating entropy object entropy <- entropy(align) #Creating OMES correlation object omes <- omes(align, gap_ratio = 0.2) #Selecting correlation matrix omes <-omes$Zscore #Creating top_pairs object and writing scores and contacts to csv files top_pairs <- top_pairs_analysis(omes, file, top = 25, entropy=entropy)
Given an entropy/dynamic_entropy object, writes each element (position or dihedral angle) and its entropy/dynamic_entropy value in a csv file, and displays the histogram.
write.entropy(entropy, filepathroot )
write.entropy(entropy, filepathroot )
entropy |
An object created by the |
filepathroot |
The root of the full path name for the entropy output file. If NULL, a "ENTROPY.csv" file is created in tempdir(). If not NULL, a "_ENTROPY.csv" extension is added to filepathroot. |
The entropy
function calculates entropy score for each position of an alignment file. The dynamic_entropy
function calculate a "dynamic entropy" score for each side chain dihedral angle of a protein during a molecular simulations.
A csv file containing the elements and their scores. A png file displaying the histogram of the scores.
Antoine GARNIER
#File path for output files out <- tempdir() file <- file.path(out,"test_seq") #Importing multiple sequence alignment align <- import.msf(system.file("msa/human_gpcr.msf", package = "Bios2cor")) #Creating entropy object entropy <- entropy(align, gap_ratio = 0.2) #Writing results to csv file write.entropy(entropy, file)
#File path for output files out <- tempdir() file <- file.path(out,"test_seq") #Importing multiple sequence alignment align <- import.msf(system.file("msa/human_gpcr.msf", package = "Bios2cor")) #Creating entropy object entropy <- entropy(align, gap_ratio = 0.2) #Writing results to csv file write.entropy(entropy, file)
Given an object of class 'pca' (result of the centered_pca
function), stores the coordinates of each element in the PC space in a txt file
write.pca(corr_pca,filepathroot, pc=NULL, entropy= NULL)
write.pca(corr_pca,filepathroot, pc=NULL, entropy= NULL)
corr_pca |
An object created by the |
filepathroot |
The root for the full path name of the output file where all coordinates on all components are stored. If NULL, a "PCA_COORD.csv" file is created in tempdir(). If not NULL, the "_PCA_COORD.csv extension is added to the filepathroot. |
pc |
An integer corresponding to the number of principal components for which coordinates of the elements are saved. By default, this number corresponds to the number of components with positive eigen values. |
entropy |
An object created by the |
The object returned by the centered_pca
function contains coordinates in the PC space for each element.
Each line of the pca file will contain the name of the current element and its coordinates.
Any line that contains Na value for X, Y or Z coordinates will be ignored.
returns a file containing the coordinates of each element in PC space.
Antoine GARNIER
#File path for output files out <- tempdir() file <- file.path(out,"test_seq4") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating OMES correlation object and selecting correlation matrix omes <- omes(align, gap_ratio = 0.2) omes <-omes$Zscore #Creating PCA object for selected matrix pca <- centered_pca(omes, filepathroot= file, filter = NULL, pc= NULL, dec_val= 5) #Saving coordinates of elements in csv file write.pca(pca, file, pc = 10, entropy = NULL)
#File path for output files out <- tempdir() file <- file.path(out,"test_seq4") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating OMES correlation object and selecting correlation matrix omes <- omes(align, gap_ratio = 0.2) omes <-omes$Zscore #Creating PCA object for selected matrix pca <- centered_pca(omes, filepathroot= file, filter = NULL, pc= NULL, dec_val= 5) #Saving coordinates of elements in csv file write.pca(pca, file, pc = 10, entropy = NULL)
Given a PCA structure, creates .pdb and .pml files for 3D visualization with Pymol
write.pca.pdb(corr_pca, filepathroot, trio_comp= c(1:3))
write.pca.pdb(corr_pca, filepathroot, trio_comp= c(1:3))
corr_pca |
An object created by the |
filepathroot |
The root for the full path name of the output PDB and PML files. If NULL, two PCA_l_m_n.pdb and PCA_l_m_n.pml files are created in tempdir(). If not null, '_PCA_l_m_n.pdb' and '_PCA_l_m_n.pml' extensions are added to the filepathroot. The l, m, n parameters are the 3 selected components. |
trio_comp |
A numeric vector of length 3 indicating the principal components to be displayed. Default is c(1, 2, 3). |
This function creates PDB and PML files to visualize the positions of the elements (sequence positions or dihedral angles) in a 3D space corresponding to three selected components of the PCA space. The PDB file can be viewed in any molecular graphics softaware. The PML file allows a nice representation with Pymol (axis, background color, size of points and for GPCRs, color code for helices).
Returns two files: a PDB file which contains three PCA coordinates for each element in PDB format and a PML file for nice visualization with Pymol.
Antoine GARNIER and Marie CHABBERT
# Example for MSA #File path for output files out <- tempdir() file <- file.path(out,"test_seq5") #Importing MSA file align <- import.msf(system.file("msa/toy_align.msf", package = "Bios2cor")) #Creating OMES correlation object and selecting correlation matrix omes <- omes(align, gap_ratio = 0.2) cor_omes <- omes$Zscore #Creating PCA object for selected matrix pca <- centered_pca(cor_omes, filepathroot = file, filter = NULL, pc = NULL, dec_val = 5) #Creating PDB and PML files (open PDB file with Pymol then "File > Run" PML file) indices <- c(1,2,3) write.pca.pdb(pca, file, indices) ### Example for MD #File path for output files wd <- tempdir() #wd <-getwd() file <- file.path(wd,"test_dyn5") #Redaing pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object for selected frames nb_frames_wanted <- 40 wanted_frames <- seq(from = 5, to= nb_frames_wanted, by = 10) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Creating rotamers object conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") rotamers <- angle2rotamer(dynamic_structure, conversion_file) #Creating OMES correlation object and selecting correlation matrix wanted_residues <- c("W","Y","F","N") omes <- dynamic_omes(dynamic_structure, rotamers, wanted_residues) cor_omes <- omes$Zscore_noauto #Creating PCA object for selected matrix pca <- centered_pca(cor_omes, file, filter = NULL, pc = NULL, dec_val = 5) #Creating PDB and PML files (open PDB file with Pymol then "File > Run" PML file) indices <- c(1,2,3) write.pca.pdb(pca, file, indices)
# Example for MSA #File path for output files out <- tempdir() file <- file.path(out,"test_seq5") #Importing MSA file align <- import.msf(system.file("msa/toy_align.msf", package = "Bios2cor")) #Creating OMES correlation object and selecting correlation matrix omes <- omes(align, gap_ratio = 0.2) cor_omes <- omes$Zscore #Creating PCA object for selected matrix pca <- centered_pca(cor_omes, filepathroot = file, filter = NULL, pc = NULL, dec_val = 5) #Creating PDB and PML files (open PDB file with Pymol then "File > Run" PML file) indices <- c(1,2,3) write.pca.pdb(pca, file, indices) ### Example for MD #File path for output files wd <- tempdir() #wd <-getwd() file <- file.path(wd,"test_dyn5") #Redaing pdb and dcd files pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") #Creating dynamic_structure object for selected frames nb_frames_wanted <- 40 wanted_frames <- seq(from = 5, to= nb_frames_wanted, by = 10) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) #Creating rotamers object conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") rotamers <- angle2rotamer(dynamic_structure, conversion_file) #Creating OMES correlation object and selecting correlation matrix wanted_residues <- c("W","Y","F","N") omes <- dynamic_omes(dynamic_structure, rotamers, wanted_residues) cor_omes <- omes$Zscore_noauto #Creating PCA object for selected matrix pca <- centered_pca(cor_omes, file, filter = NULL, pc = NULL, dec_val = 5) #Creating PDB and PML files (open PDB file with Pymol then "File > Run" PML file) indices <- c(1,2,3) write.pca.pdb(pca, file, indices)
Given a correlation object, writes the score, the Zscore and, optionally the entropy, for each pair of elements in a csv file.
write.scores(correlation, filepathroot, entropy= NULL)
write.scores(correlation, filepathroot, entropy= NULL)
correlation |
An object created by a correlation/covariation function ( |
entropy |
An object created by the |
filepathroot |
The root of the full path name for the output file. DEfault is NULL, a "CORR_SCORES.csv" file is created in tempdir(). If not NULL, the "_CORR_SCORES.csv" extension is added to the filepathroot. |
Elements represent positions in multiple sequence alignments and side chain dihedral angles in molecular dynamic simulations (MD).
In sequence analysis, the correlation object contains two matrices with the correlation/covariation scores and Z-scores for each pair of elements [i,j]. If entropy = NULL, each line of the output file will contain element i, element j, score[i,j], and Z-score[i,j]. If entropy is not NULL, each line of the output file will contain element i, element j, score[i,j], Zscore[i,j], entropy[i], and entropy[j].
In MD analysis, the correlation object contains four matrices with (1) the correlation/covariation scores for each pair of rotamers (score), (2) the Z-scores for each pair of rotamers (Zscore), (3) the correlation/covariation scores for each pair of rotamers with zero values for autocorrelation (correlation within the same side chain) (score_noauto) and (4) the Z-scores calculated without autocorrelation pairs and zero values for autocorrelation pairs (Zscore_noauto). If entropy = NULL, each line of the output file will contain element i, element j, score[i,j], Zscore[i,j], score_noauto[i,j], and Zscore_noauto[i,j]. If entropy is not NULL, each line of the output file will contain element i, element j, score[i,j], Zscore[i,j], score_noauto[i,j], Zscore_noauto[i,j], entropy[i], and entropy[j].
A csv file containing the correlation/covariation scores and optionally the entropies.
Antoine GARNIER and Marie CHABBERT
#Example for MSA #File path for output files out <- tempdir() file <- file.path(out,"test_seq") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating correlation and entropy objects correlation <- omes(align, gap_ratio= 0.2) entropy <- entropy(align) #Writing results to csv file write.scores(correlation, file, entropy ) ###Example for MD #File path for output files wd <- tempdir() #wd <-getwd() file <- file.path(wd,"test_dyn") #Reading the pdb and dcd files and the angles to rotamers conversion file pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") #Creating dynamic_structure and rotamers objects wanted_frames <- seq(from = 5, to = 40, by = 10) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) rotamers <- angle2rotamer(dynamic_structure, conversion_file) #Creating correlation and entropy objects wanted_residues <- c("F","H","N","Y","W") #dyn_corr <- dynamic_omes(dynamic_structure, rotamers, wanted_residues) dyn_corr <- dynamic_circular(dynamic_structure, wanted_residues) dyn_entropy <- dynamic_entropy(rotamers) #Writing results to csv file write.scores(dyn_corr, file, dyn_entropy )
#Example for MSA #File path for output files out <- tempdir() file <- file.path(out,"test_seq") #Importing MSA file msf <- system.file("msa/toy_align.msf", package = "Bios2cor") align <- import.msf(msf) #Creating correlation and entropy objects correlation <- omes(align, gap_ratio= 0.2) entropy <- entropy(align) #Writing results to csv file write.scores(correlation, file, entropy ) ###Example for MD #File path for output files wd <- tempdir() #wd <-getwd() file <- file.path(wd,"test_dyn") #Reading the pdb and dcd files and the angles to rotamers conversion file pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") conversion_file <- system.file("rotamer/dynameomics_rotamers.csv", package= "Bios2cor") #Creating dynamic_structure and rotamers objects wanted_frames <- seq(from = 5, to = 40, by = 10) dynamic_structure <- dynamic_structure(pdb, trj, wanted_frames) rotamers <- angle2rotamer(dynamic_structure, conversion_file) #Creating correlation and entropy objects wanted_residues <- c("F","H","N","Y","W") #dyn_corr <- dynamic_omes(dynamic_structure, rotamers, wanted_residues) dyn_corr <- dynamic_circular(dynamic_structure, wanted_residues) dyn_entropy <- dynamic_entropy(rotamers) #Writing results to csv file write.scores(dyn_corr, file, dyn_entropy )
Convert cartesian coordinate matrix to torsion angles with
function torsion.pdb
.
xyz2torsion(pdb, xyz, tbl = c("basic", "mainchain", "sidechain", "all", "phi", "psi", paste("chi", 1:5, sep="")), ncore = NULL)
xyz2torsion(pdb, xyz, tbl = c("basic", "mainchain", "sidechain", "all", "phi", "psi", paste("chi", 1:5, sep="")), ncore = NULL)
pdb |
A PDB structure object as obtained from |
xyz |
Cartesian coordinates as a Mx(3N) matrix. |
tbl |
Names of torsion angles to calculate. |
ncore |
Number of CPU cores used to do the calculation. By default (NULL), use all detected CPU cores. |
Available values for the argument ‘tbl’ include:
Basic: "phi", "psi", "chi1".
Mainchain: "phi", "psi".
Sidechain: "chi1", "chi2", "chi3", "chi4", "chi5".
All: all of the above.
Each individual angle.
Returns a MxP matrix, where M is the number of frames and P the number of valid torsion angles.
New function from the bio3d package, available at <https://github.com/Grantlab/bio3d/blob/master/new_funs/xyz2torsion.R>
Xin-Qiu Yao
Grant, B.J. et al. (2006) Bioinformatics 22, 2695–2696.
pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") pdb <- read.pdb(pdb) trj <- read.dcd(trj) #Selecting only "CA" atoms ca.inds <- atom.select(pdb, elety = "CA") #Getting xyz coordinates using fit.xyz form bio3d package xyz <- fit.xyz(fixed = pdb$xyz, mobile = trj,fixed.inds=ca.inds$xyz,mobile.inds=ca.inds$xyz) frames <- seq(from= 1, to= 40, by= 2) #Creating torsion object for side chains using xyz2torsion function from bio3d package tor <- xyz2torsion(pdb, xyz[frames,], tbl = "sidechain", ncore= 1)
pdb <- system.file("rotamer/toy_coordinates.pdb", package= "Bios2cor") trj <- system.file("rotamer/toy_dynamics.dcd", package= "Bios2cor") pdb <- read.pdb(pdb) trj <- read.dcd(trj) #Selecting only "CA" atoms ca.inds <- atom.select(pdb, elety = "CA") #Getting xyz coordinates using fit.xyz form bio3d package xyz <- fit.xyz(fixed = pdb$xyz, mobile = trj,fixed.inds=ca.inds$xyz,mobile.inds=ca.inds$xyz) frames <- seq(from= 1, to= 40, by= 2) #Creating torsion object for side chains using xyz2torsion function from bio3d package tor <- xyz2torsion(pdb, xyz[frames,], tbl = "sidechain", ncore= 1)