Title: | Unbinned Weighted Histogram Analysis Method (UWHAM) |
---|---|
Description: | A method for estimating log-normalizing constants (or free energies) and expectations from multiple distributions (such as multiple generalized ensembles). |
Authors: | Zhiqiang Tan and Emilio Gallicchio |
Maintainer: | Zhiqiang Tan <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1 |
Built: | 2024-12-02 06:30:27 UTC |
Source: | CRAN |
UWHAM for estimating log-normalizing constants (or free energies) and expectations from multiple distributions (such as multiple generalized ensembles).
The R package UWHAM
– version 1.0 can be used for two main tasks:
to estimate log-normalizing constants (or free energies) from multiple distributions,
to estimate expectations from multiple distributions.
There are 3 main functions:
uwham()
: to estimate free energies and associated variances,
uwham.phi()
: to estimate expectations and associated variances,
uwham.boot()
: to estimate variances based on block bootstrap.
The package also provides a function, histw()
, for plotting weighted histograms.
Time series of interaction parameters and corresponding binding energies from a Hamiltonian hopping simulation of molecular binding.
data(cyclooctanol)
data(cyclooctanol)
A data frame containing 2 columns and 35000 rows.
The dataset contains ligand-receptor interaction parameters (lambda) and binding energies for the equivalent of an Hamiltonian hopping simulation of cyclooctanol binding to beta-cyclodextrin. (Hamiltonian hopping is the same as simulated tempering except that potential parameters are varied rather than temperature.) This illustrative dataset was constructed by concatenating the output of some of the replicas of a Hamiltonian replica exchange simulation of the same complex (Wickstrom et al. 2013). Harmless discontinuities may be observed at the joining points.
The dataset contains 2 columns and 3500 rows. Rows correspond to
consecutive time points. The first column cyclooctanol$V1
contains lambda values (24 discrete values between 0 and 1) and the second column
cyclooctanol$V2
records binding energy values in kcal/mol.
Wickstrom, L., Gallicchio, E., He, P., and Levy, R.M. (2013) "Large scale affinity calculations of cyclodextrin host-guest complexes: Understanding the role of reorganization in the molecular recognition process," Journal of Chemical Theory and Computation, submitted for publication.
This function plots a weighted histogram.
histw(x, w, xaxis, xmin, xmax, ymax, bar=TRUE, add=FALSE, col="black", dens=TRUE)
histw(x, w, xaxis, xmin, xmax, ymax, bar=TRUE, add=FALSE, col="black", dens=TRUE)
x |
A data vector. |
w |
A weight vector, which will be rescaled to sum up to one. |
xaxis |
A vector of cut points. |
xmin |
The minimum of |
xmax |
The maximum of |
ymax |
The maximum of |
bar |
bar plot (if |
add |
if |
col |
color of lines. |
dens |
if |
Tan, Z., Gallicchio, E., Lapelosa, M., and Levy, R.M. (2012) "Theory of binless multi-state free energy estimation with applications to protein-ligand binding," Journal of Chemical Physics, 136, 144102.
# Boltzmann constant bet <- 1.0/(0.001986209*300.0) # negative potential function npot.fcn <- function(x, lam) -bet*lam*x # read data (soft.core) lam <- c(0.0, 0.001, 0.002, 0.004, 0.006, 0.008, 0.01, 0.02, 0.06, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0) m <- length(lam) data(ligand2.soft) lig.data <- ligand2.soft$V1 size <- rep(1000, m) N <- sum(size) # compute negative potential neg.pot <- matrix(0, N,m) for (j in 1:length(lam)) neg.pot[,j] <- npot.fcn(x=lig.data, lam=lam[j]) # estimate free energies out <- uwham(logQ=neg.pot, size=size, fisher=TRUE) -out$ze/bet + 0.71 sqrt(out$ve)/bet # the bins used to construct weighted histograms bins <- seq(-35, 400, 2.5) grid <- c(bins, 2e+3) W <- out$W /N par(mfrow=c(1,2)) # Plot the weighted histograms at two lambdas histw(lig.data, w=W[,8], xaxis=grid, xmin=-35, xmax=200, ymax=.1) histw(lig.data, w=W[,10], xaxis=grid, xmin=-35, xmax=200, ymax=.1, bar=0, add=TRUE, col="red") # plot the raw histogram and weighted histogram histw(lig.data[ (5*1000+1):(6*1000) ], w=rep(1,1000), xaxis=grid, xmin=-35, xmax=200, ymax=.04) histw(lig.data, w=W[,6], xaxis=grid, xmin=-35, xmax=200, ymax=.04, bar=0, add=TRUE, col="red")
# Boltzmann constant bet <- 1.0/(0.001986209*300.0) # negative potential function npot.fcn <- function(x, lam) -bet*lam*x # read data (soft.core) lam <- c(0.0, 0.001, 0.002, 0.004, 0.006, 0.008, 0.01, 0.02, 0.06, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0) m <- length(lam) data(ligand2.soft) lig.data <- ligand2.soft$V1 size <- rep(1000, m) N <- sum(size) # compute negative potential neg.pot <- matrix(0, N,m) for (j in 1:length(lam)) neg.pot[,j] <- npot.fcn(x=lig.data, lam=lam[j]) # estimate free energies out <- uwham(logQ=neg.pot, size=size, fisher=TRUE) -out$ze/bet + 0.71 sqrt(out$ve)/bet # the bins used to construct weighted histograms bins <- seq(-35, 400, 2.5) grid <- c(bins, 2e+3) W <- out$W /N par(mfrow=c(1,2)) # Plot the weighted histograms at two lambdas histw(lig.data, w=W[,8], xaxis=grid, xmin=-35, xmax=200, ymax=.1) histw(lig.data, w=W[,10], xaxis=grid, xmin=-35, xmax=200, ymax=.1, bar=0, add=TRUE, col="red") # plot the raw histogram and weighted histogram histw(lig.data[ (5*1000+1):(6*1000) ], w=rep(1,1000), xaxis=grid, xmin=-35, xmax=200, ymax=.04) histw(lig.data, w=W[,6], xaxis=grid, xmin=-35, xmax=200, ymax=.04, bar=0, add=TRUE, col="red")
This inserts a value x0
at d
-th position of x
.
insert(x, d, x0 = 0)
insert(x, d, x0 = 0)
x |
A vector. |
d |
A position in |
x0 |
A value to be inserted. |
The resulting vector.
A dataset for protein-ligand binding with an unmodified energy function.
data(ligand2.hard)
data(ligand2.hard)
A data frame containing 1 column and 18000 rows.
The dataset contains binding energies (in kcal/mol) for the ligand 2/FKBP complex with an unmodified energy function, simulated using replica exchange with 18 lambda values (Tan et al. 2012).
Suppose that the dataset is converted to a 1000 x 18
matrix as
follows.
data(ligand2.hard) lig.data <- matrix(ligand2.hard$V1, nrow=1000)
The 18 columns of lig.data
correspond to the raw output of the 18 replicas. A replica is defined as a continuous
replica exchange molecular dynamics thread with variable lambda (exchanges are
performed by swapping lambda values rather than coordinates). The 1000
rows represent as many sequential time points during the replica
exchange trajectory. Each row contains binding energies from each
replica taken at the same simulation time (synchronous sampling).
Tan, Z., Gallicchio, E., Lapelosa, M., and Levy, R.M. (2012) "Theory of binless multi-state free energy estimation with applications to protein-ligand binding," Journal of Chemical Physics, 136, 144102.
A dataset for protein-ligand binding with a soft energy function.
data(ligand2.soft)
data(ligand2.soft)
A data frame containing 1 column and 15000 rows.
The dataset contains binding energies (in kcal/mol) for the ligand 2/FKBP complex with a soft-core interaction energy function, simulated using replica exchange with 15 lambda values (Tan et al. 2012).
Suppose that the dataset is converted to a 1000 x 15
matrix as follows.
data(ligand2.soft) lig.data <- matrix(ligand2.soft$V1, nrow=1000)
The 15 columns of lig.data
correspond to the 15 lambda states
from smallest (lambda=0) to largest (lambda=1). The data were obtained
by reordering by lambda the raw output of the molecular
dynamics replicas at each observation time. A replica is defined as a continuous
replica exchange molecular dynamics thread with variable lambda (exchanges are
performed by swapping lambda values rather than coordinates). The 1000
rows represent as many sequential time points during the replica
exchange trajectory. Each row contains binding energies from each
replica taken at the same simulation time (synchronous sampling).
Tan, Z., Gallicchio, E., Lapelosa, M., and Levy, R.M. (2012) "Theory of binless multi-state free energy estimation with applications to protein-ligand binding," Journal of Chemical Physics, 136, 144102.
This function computes the objective function, its gradient and its hessian matrix for UWHAM.
obj.fcn(ze, logQ, size, base)
obj.fcn(ze, logQ, size, base)
ze |
A vector of log-normalizing constants (or free energies) for the sampled thermodynamic states. |
logQ |
The matrix of log of unnormalized density ratios for the sampled thermodynamic states over the baseline. |
size |
A vector giving the individual sample sizes for the sampled thermodynamic states. |
base |
The baseline index. |
The objective function is convex as discussed in Tan et al. (2012). See also Gill et al. (1998) for related results on biased sampling models.
value |
The value of the objective function. |
gradient |
The gradient of the objective function. |
hessian |
The hessian matrix of objective function. |
Gill, R., Vardi, Y., and Wellner, J. (1998) "Large sample theory of empirical distributions in biased sampling models," Annals of Statistics, 16, 1069–1112.
Tan, Z., Gallicchio, E., Lapelosa, M., and Levy, R.M. (2012) "Theory of binless multi-state free energy estimation with applications to protein-ligand binding," Journal of Chemical Physics, 136, 144102.
This function implements UWHAM for estimating log-normalizing constants (or free energies) and expectations from multiple distributions (such as multiple generalized ensembles) as described in Tan et al. (2012).
uwham(label=NULL, logQ, size=NULL, base = NULL, init = NULL, fisher = TRUE)
uwham(label=NULL, logQ, size=NULL, base = NULL, init = NULL, fisher = TRUE)
label |
A vector of length |
logQ |
|
size |
A vector of length |
base |
The baseline index, between |
init |
A vector of length |
fisher |
Logical; if |
The UWHAM method results from a number of interesting, sometimes independent, developments in physics and statistics. See Kong et al. (2003) for a formal statistical treatment along with earlier references, and Tan et al. (2012) for a more accessible account, presenting the method as a binless extension of the Weighted Histogram Analysis Method (WHAM) widely known in physics and chemistry (e.g., Ferrenberg and Swendsen 1989). The possibility of obtaining free energies from a binless extension of WHAM was noticed by various authors (e.g., Kumar et al. 1992; Newman & Barkema 1999). The binless method was later reintroduced by Shirts & Chodera (2008) to the physics literature and was called the Multi-state Bennet Acceptance Ratio method (MBAR) because it can be interpreted as a multi-state extension of the Bennet Acceptance Ratio (BAR) method. An implementation of MBAR in the Python language developed by Shirts & Chodera is freely available (see https://simtk.org/home/pymbar). The UWHAM package, while adopting an alternative numerical approach, provides identical point estimates compared to the MBAR Python package. In addition, the UWHAM package provides variance estimation based on variance formulas without using generalized inverses or, for correlated data, by block bootstrap.
A typical application of UWHAM involves the computation of the relative free energies of a series of thermodynamic states differing in environmental conditions (temperature for example) and/or Hamiltonian parameters (such as the strength of biasing potentials) from data collected at these thermodynamic states. The method takes as input the reduced energies (such as the inverse temperature times the negative of the potential energy for canonical ensembles differing in temperature) of the observations at all thermodynamic states of interest. In addition to the free energies, the output includes estimates of the thermodynamic weights of the observations for all states to be used for thermodynamic reweighting calculations.
The UWHAM method is statistically optimal in yielding the smallest asymptotic variances, provided that the individual samples are independent and the observations in each sample are also independent (Tan 2004).
To compute point estimates, the method is implemented here by minimizing a convex objective function, as described in Tan et al. (2012). This approach can be more effective than solving the nonlinear equations by the self-consistency or the Newton-Raphson algorithm. Currently, the optimization is done by using the R package trust.
Variance estimation provided here is based on the Fisher information or the Sandwich variance formula,
as presented in Tan et al. (2012).
In contrast with the Sandwich formula, the Fisher information based formula does not require labels indicating
which thermodynamic state each observation is obtained from (see the dataset ligand2.hard
).
The analytical variance formulas are consistent when the observations are considered independent.
Alternatively, variance estimation can be done by block bootstrap, implemented in uwham.boot
.
ze |
The vector of estimated log-normalizing constants (or log of the partition functions). |
ve |
The vector of estimated variances for |
Ve |
The estimated variance-covariance matrix for |
W |
The |
check |
The column averages of |
out |
The output of trust used to minimize the objective function; see help(trust). |
label |
Same as argument |
size |
Same as argument |
base |
Same as argument |
Ferrenberg, A.M. and Swendsen, R.H. (1989) "Optimized Monte Carlo data analysis," Physics Review Letters, 63, 1195-1198.
Kong, A., McCullagh, P., Meng, X.-L., Nicolae, D., and Tan, Z. (2003) "A theory of statistical models for Monte Carlo integration" (with discussion), Journal of the Royal Statistical Society, Ser. B, 65, 585-618.
Kumar, S., Bouzida, D., Swendsen, R.H., Kollman, P.A. and Rosenberg, J.M. (1992) "The Weighted Histogram Analysis Method for free-energy calculations on biomolecules. I. The method," Journal of Computational Chemistry, 13, 1011-1021.
Newman, M.E.J. and Barkema, G.T. (1999) Monte Carlo Methods in Statistical Physics, Oxford University Press, New York.
Shirts, M.R. and J. D. Chodera, J.D. (2008) "Statistically optimal analysis of samples from multiple equilibrium states," Journal of Chemical Physics, 129, 124105.
Tan, Z. (2004) "On a likelihood approach for Monte Carlo integration," Journal of the American Statistical Association, 99, 1027-1036.
Tan, Z., Gallicchio, E., Lapelosa, M., and Levy, R.M. (2012) "Theory of binless multi-state free energy estimation with applications to protein-ligand binding," Journal of Chemical Physics, 136, 144102.
####################################### ############## example 1 ############## ####################################### # This example illustrates the calculation of the standard free energy # of binding of a ligand to a protein receptor by means of an alchemical # perturbation potential of the form lambda*binding_energy(r), where # lambda is a scaling parameter (lambda=0 corresponds to the # protein-ligand uncoupled state, and lambda=1 corresponds to the # coupled state) and binding_energy(r) is the (solvent averaged) # potential energy of conformation r of the complex relative to one in # which the receptor and the ligand are rigidly displaced at infinite # separation. See Gallicchio et al., J. Chem. Theory Comput. 6, # 2961-2977 (2010), and Tan et al. J. Chem. Phys., 136, 144102 (2012). # Inverse temperature beta, in kcal/mol^-1 bet <- 1.0/(0.001986209*300.0) # negative reduced potential function -beta*lambda*binding_energy. # "x" is the binding energy of a structure of the complex and "lam" the # value of lambda at which to compute the reduced energy npot.fcn <- function(x, lam) -bet*lam*x # values of lambda for the calculation with a "hard-core" potential lam <- c(0.0, 0.000000001, 0.00000001, 0.0000001, 0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1, 0.15, 0.25, 0.35, 0.5, 0.6, 0.75, 0.9, 1.0) #number of alchemical states m <- length(lam) # load binding energies data(ligand2.hard) lig.data <- ligand2.hard$V1 # 1000 observations at each lambda state size <- rep(1000, m) # total sample size N <- sum(size) # compute negative potential of each observation at each lambda state neg.pot <- matrix(0, N,m) for (j in 1:length(lam)) neg.pot[,j] <- npot.fcn(x=lig.data, lam=lam[j]) # estimate free energies using UWHAM out <- uwham(logQ=neg.pot, size=size, fisher=TRUE) # convergence diagnosis: the elements should be equal to 1 out$check # the "ze" values are dimensionless free energies, that is the # log of the partition functions (log Z). # To obtain thermodynamic free energies, multiply by -kT. # 0.71 kcal/mol is the standard state correction; see # Lapelosa et al. J Chem Theory Comput, 8, 44-60 (2012). -out$ze/bet + 0.71 # variances of free energies from Fisher information sqrt(out$ve)/bet # perform block bootstrap for free energies to take into account # time correlations in the binding energy data # To save time for package checking, this is not run. #out.boot <- uwham.boot(proc.type="parallel", block.size=50, boot.size=100, # logQ=neg.pot, size=size) # #-out.boot$ze/bet + 0.71 #sqrt(out.boot$ve)/bet # estimation of average binding energies and variances # at lambda = 0.6, 0.75, 0.9, 1.0 state <- 15:18 out.phi <- uwham.phi(phi=lig.data, state=state, out.uwham=out, fisher=TRUE) out.phi$phi out.phi$phi.v # block bootstrap for both free energies and expectations # To save time for package checking, this is not run. #out.boot <- uwham.boot(proc.type="parallel", block.size=50, boot.size=100, # logQ=neg.pot, size=size, # phi=lig.data, state=state) # #out.boot$phi #out.boot$phi.v ################################################ ## example 2 (unequal and zero sample sizes) ### ################################################ # same calculation as above but with a "soft-core" potential and # illustrating the ability to compute free energies and expectations # for states with unequal sample sizes including those with # zero sample sizes (or states that have not been sampled). # See Tan et al. J. Chem. Phys., 136, 144102 (2012). rm(list=ls()) # inverse temperature bet <- 1.0/(0.001986209*300.0) # negative potential function npot.fcn <- function(x, lam) -bet*lam*x # read data (soft core) lam <- c(0.0, 0.001, 0.002, 0.004, 0.006, 0.008, 0.01, 0.02, 0.06, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0) m <- length(lam) data(ligand2.soft) lig.data <- ligand2.soft$V1 ### unequal and zero sample sizes size <- c( rep(1000, 5), rep(500, 3), rep(0, 2), rep(1000,5)) subs <- c(rep(TRUE, 5000), rep(c(rep(TRUE,500),rep(FALSE,500)), 3), rep(FALSE, 2000), rep(TRUE,5000)) lig.data <- lig.data[subs] N <- sum(size) # compute negative potential neg.pot <- matrix(0, N,m) for (j in 1:length(lam)) neg.pot[,j] <- npot.fcn(x=lig.data, lam=lam[j]) # estimate free energies out <- uwham(logQ=neg.pot, size=size, fisher=TRUE) -out$ze/bet + 0.71 sqrt(out$ve)/bet # block bootstrap for free energies, # pretending that the data are generated from independent chains. # To save time for package checking, this is not run. #out.boot <- uwham.boot(proc.type="indep", # block.size=rep(50,m-2), boot.size=100, # logQ=neg.pot, size=size) # #-out.boot$ze/bet + 0.71 #sqrt(out.boot$ve)/bet ####################################### ## example 3 (serial tempering data) ## ####################################### rm(list=ls()) # inverse temperature bet <- 1.0/(0.001986209*300.0) # negative potential function npot.fcn <- function(x, lam) -bet*lam*x # lambda states lam <- c(0.0,0.001,0.002,0.004,0.005,0.006,0.008,0.01,0.02,0.04, 0.07,0.1,0.25,0.5,0.55,0.6,0.65,0.7,0.75,0.8, 0.85,0.9,0.95,1.0) m <- length(lam) # loads cyclooctanol dataset data(cyclooctanol) lig.data <- cyclooctanol$V2 # sample size N <- length(lig.data) # state labels based on lambda values # note that labels=1:m, not 0:(m-1) state.labels <- factor(cyclooctanol$V1, labels=1:m) # compute negative potential neg.pot <- matrix(0, N,m) for (j in 1:m) neg.pot[,j] <- npot.fcn(x=lig.data, lam=lam[j]) # estimate free energies, note that size=NULL because label is given out <- uwham(label=state.labels, logQ=neg.pot, fisher=TRUE) # free energies as a function of lambda, 0.36 kcal/mol is a standard # state correction -out$ze/bet + 0.36 sqrt(out$ve)/bet # block bootstrap for free energies, note that proc.type="serial" # for simulated tempering data. # To save time for package checking, this is not run. #out.boot <- uwham.boot(proc.type="serial", block.size=10, boot.size=100, # label=state.labels, logQ=neg.pot) # #-out.boot$ze/bet + 0.36 #sqrt(out.boot$ve)/bet
####################################### ############## example 1 ############## ####################################### # This example illustrates the calculation of the standard free energy # of binding of a ligand to a protein receptor by means of an alchemical # perturbation potential of the form lambda*binding_energy(r), where # lambda is a scaling parameter (lambda=0 corresponds to the # protein-ligand uncoupled state, and lambda=1 corresponds to the # coupled state) and binding_energy(r) is the (solvent averaged) # potential energy of conformation r of the complex relative to one in # which the receptor and the ligand are rigidly displaced at infinite # separation. See Gallicchio et al., J. Chem. Theory Comput. 6, # 2961-2977 (2010), and Tan et al. J. Chem. Phys., 136, 144102 (2012). # Inverse temperature beta, in kcal/mol^-1 bet <- 1.0/(0.001986209*300.0) # negative reduced potential function -beta*lambda*binding_energy. # "x" is the binding energy of a structure of the complex and "lam" the # value of lambda at which to compute the reduced energy npot.fcn <- function(x, lam) -bet*lam*x # values of lambda for the calculation with a "hard-core" potential lam <- c(0.0, 0.000000001, 0.00000001, 0.0000001, 0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1, 0.15, 0.25, 0.35, 0.5, 0.6, 0.75, 0.9, 1.0) #number of alchemical states m <- length(lam) # load binding energies data(ligand2.hard) lig.data <- ligand2.hard$V1 # 1000 observations at each lambda state size <- rep(1000, m) # total sample size N <- sum(size) # compute negative potential of each observation at each lambda state neg.pot <- matrix(0, N,m) for (j in 1:length(lam)) neg.pot[,j] <- npot.fcn(x=lig.data, lam=lam[j]) # estimate free energies using UWHAM out <- uwham(logQ=neg.pot, size=size, fisher=TRUE) # convergence diagnosis: the elements should be equal to 1 out$check # the "ze" values are dimensionless free energies, that is the # log of the partition functions (log Z). # To obtain thermodynamic free energies, multiply by -kT. # 0.71 kcal/mol is the standard state correction; see # Lapelosa et al. J Chem Theory Comput, 8, 44-60 (2012). -out$ze/bet + 0.71 # variances of free energies from Fisher information sqrt(out$ve)/bet # perform block bootstrap for free energies to take into account # time correlations in the binding energy data # To save time for package checking, this is not run. #out.boot <- uwham.boot(proc.type="parallel", block.size=50, boot.size=100, # logQ=neg.pot, size=size) # #-out.boot$ze/bet + 0.71 #sqrt(out.boot$ve)/bet # estimation of average binding energies and variances # at lambda = 0.6, 0.75, 0.9, 1.0 state <- 15:18 out.phi <- uwham.phi(phi=lig.data, state=state, out.uwham=out, fisher=TRUE) out.phi$phi out.phi$phi.v # block bootstrap for both free energies and expectations # To save time for package checking, this is not run. #out.boot <- uwham.boot(proc.type="parallel", block.size=50, boot.size=100, # logQ=neg.pot, size=size, # phi=lig.data, state=state) # #out.boot$phi #out.boot$phi.v ################################################ ## example 2 (unequal and zero sample sizes) ### ################################################ # same calculation as above but with a "soft-core" potential and # illustrating the ability to compute free energies and expectations # for states with unequal sample sizes including those with # zero sample sizes (or states that have not been sampled). # See Tan et al. J. Chem. Phys., 136, 144102 (2012). rm(list=ls()) # inverse temperature bet <- 1.0/(0.001986209*300.0) # negative potential function npot.fcn <- function(x, lam) -bet*lam*x # read data (soft core) lam <- c(0.0, 0.001, 0.002, 0.004, 0.006, 0.008, 0.01, 0.02, 0.06, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0) m <- length(lam) data(ligand2.soft) lig.data <- ligand2.soft$V1 ### unequal and zero sample sizes size <- c( rep(1000, 5), rep(500, 3), rep(0, 2), rep(1000,5)) subs <- c(rep(TRUE, 5000), rep(c(rep(TRUE,500),rep(FALSE,500)), 3), rep(FALSE, 2000), rep(TRUE,5000)) lig.data <- lig.data[subs] N <- sum(size) # compute negative potential neg.pot <- matrix(0, N,m) for (j in 1:length(lam)) neg.pot[,j] <- npot.fcn(x=lig.data, lam=lam[j]) # estimate free energies out <- uwham(logQ=neg.pot, size=size, fisher=TRUE) -out$ze/bet + 0.71 sqrt(out$ve)/bet # block bootstrap for free energies, # pretending that the data are generated from independent chains. # To save time for package checking, this is not run. #out.boot <- uwham.boot(proc.type="indep", # block.size=rep(50,m-2), boot.size=100, # logQ=neg.pot, size=size) # #-out.boot$ze/bet + 0.71 #sqrt(out.boot$ve)/bet ####################################### ## example 3 (serial tempering data) ## ####################################### rm(list=ls()) # inverse temperature bet <- 1.0/(0.001986209*300.0) # negative potential function npot.fcn <- function(x, lam) -bet*lam*x # lambda states lam <- c(0.0,0.001,0.002,0.004,0.005,0.006,0.008,0.01,0.02,0.04, 0.07,0.1,0.25,0.5,0.55,0.6,0.65,0.7,0.75,0.8, 0.85,0.9,0.95,1.0) m <- length(lam) # loads cyclooctanol dataset data(cyclooctanol) lig.data <- cyclooctanol$V2 # sample size N <- length(lig.data) # state labels based on lambda values # note that labels=1:m, not 0:(m-1) state.labels <- factor(cyclooctanol$V1, labels=1:m) # compute negative potential neg.pot <- matrix(0, N,m) for (j in 1:m) neg.pot[,j] <- npot.fcn(x=lig.data, lam=lam[j]) # estimate free energies, note that size=NULL because label is given out <- uwham(label=state.labels, logQ=neg.pot, fisher=TRUE) # free energies as a function of lambda, 0.36 kcal/mol is a standard # state correction -out$ze/bet + 0.36 sqrt(out$ve)/bet # block bootstrap for free energies, note that proc.type="serial" # for simulated tempering data. # To save time for package checking, this is not run. #out.boot <- uwham.boot(proc.type="serial", block.size=10, boot.size=100, # label=state.labels, logQ=neg.pot) # #-out.boot$ze/bet + 0.36 #sqrt(out.boot$ve)/bet
This function implements variance estimation based on block bootstrap for UWHAM estimates of free energies and expectations.
uwham.boot(proc.type, block.size, boot.size, seed = 0, label = NULL, logQ, size=NULL, base = NULL, init = NULL, phi = NULL, state = NULL)
uwham.boot(proc.type, block.size, boot.size, seed = 0, label = NULL, logQ, size=NULL, base = NULL, init = NULL, phi = NULL, state = NULL)
proc.type |
Type of simulation, |
block.size |
A vector of length |
boot.size |
The number of bootstrap replications. |
seed |
Seed for random number generation. |
label |
A vector of length |
logQ |
|
size |
A vector of length |
base |
The baseline index, between |
init |
A vector of length |
phi |
A vector of function values on the pooled sample; if |
state |
A vector of indices for the thermodynamic states under which expectations are to be computed;
if |
The use of block bootstrap requires at least two more inputs than uwham
: the type of simulation proc.type
and the time ordering of the observations.
If proc.type="indep"
, the data are assumed to be generated by (approximately) independent chains at different thermodynamic states. The observations corresponding to the rows of logQ
are assumed to be ordered by thermodynamic state and by simulation time within each state.
To perform block bootstrap, data blocks are resampled within each thermodynamic state and then pooled to build bootstrap samples.
If proc.type="parallel"
, the data are assumed to be generated by (synchronous) parallel tempering or replica exhanges. Equal sample sizes are required. The observations corresponding to the rows of logQ
are assumed to be ordered by simulation time and then by either thermodynamic state (as in the dataset ligand2.soft
) or replica (as in the dataset ligand2.hard
). The synchronized block bootstrap is implemented such that blocks corresponding to the same time interval are randomly selected from all the thermodynamic states (Tan et al. 2012).
If proc.type="serial"
, the data are assumed to be generated by (single-chain) serial tempering or simulated tempering. The observations corresponding to the rows of logQ
are assumed to be ordered by time. The thermodynamic labels must then be provided in label
. To perform block bootstrap, data blocks are resampled from the entire chain, and the number of observations at each state is re-computed for each bootstrap sample.
ze |
The vector of averages of estimated free energies over the bootstrap samples, which can differ from the output |
ve |
The vector of estimated variances for the output |
phi |
The vector of averages of estimated expectations over the bootstrap samples (which can differ from the output |
phi.v |
The vector of estimated variances for |
Tan, Z., Gallicchio, E., Lapelosa, M., and Levy, R.M. (2012) "Theory of binless multi-state free energy estimation with applications to protein-ligand binding," Journal of Chemical Physics, 136, 144102.
#See the examples for uwham().
#See the examples for uwham().
This function implements UWHAM for estimating expectations for multiple distributions (such as multiple generalized ensembles) as described in Tan et al. (2012).
uwham.phi(phi, state, out.uwham, fisher = TRUE)
uwham.phi(phi, state, out.uwham, fisher = TRUE)
phi |
A vector of function (or observable) values on the pooled sample. |
state |
A vector of indices between |
out.uwham |
The output of |
fisher |
Logical; if |
The implementation is directly based on Tan et al. (2012). See the details for uwham
.
phi |
The vector of estimated expectations. |
phi.v |
The vector of estimated variances for |
phi.V |
The variance-covariance matrix for |
Tan, Z., Gallicchio, E., Lapelosa, M., and Levy, R.M. (2012) "Theory of binless multi-state free energy estimation with applications to protein-ligand binding," Journal of Chemical Physics, 136, 144102.
#See the examples for uwham().
#See the examples for uwham().