Journal of Molecular Biology
Regular articleDiscrimination of the native from misfolded protein models with an energy function including implicit solvation 1
Introduction
Due to the increasing number of protein sequences that are being determined and the need to be able to predict their structures, there is an intense effort at present to devise potential energy functions that can distinguish native from misfolded proteins. Use of such functions is based on the thermodynamic hypothesis which states that a native protein has the conformation or set of very similar conformations that minimize the free energy of the system. The equilibrium probability distribution of protein conformations, p(q), is given by (e.g. Gibson and Scheraga 1969, Karplus and Shakhnovich 1992, Lazaridis and Karplus 1999): where the effective energy W(q) of conformation q can be written: Here Eintra(q) is the intraprotein energy and ΔGsolv(q) is the solvation free energy. A stable protein under physiological conditions populates a narrow range of conformations (within about 2 Å RMS), that together make up the native state (Brooks et al., 1988). The effective energy surface in the neighborhood of the native state is a complex multiminimum surface, as has been demonstrated by experiments (Frauenfelder et al., 1991) and simulations Caves et al 1998, Elber and Karplus 1987, Levitt 1983, Noguti and Go 1989. Since the entropic contribution to the free energy of the native state can be approximated by that of a single or a few conformations (Karplus et al., 1987) relative to the multitude of conformations in the unfolded state, the native state minimum must be very deep, i.e. there must be a large gap between the energy of the native state and the average non-native state whose structure differs significantly from it Bryngelson and Wolynes 1987, Karplus and Shakhnovich 1992, Moult 1997, Sali et al 1994. Any function that is useful for representing the effective energy of proteins must satisfy the condition that the native state is lowest in energy and that there is a sizeable energy gap.
Much of the recent effort in the search for such effective energy functions has focused on the analysis of known protein structures supplemented by assumptions concerning the form of the potential function Bowie et al 1991, Casari and Sippl 1992, DeBolt and Skolnick 1996, Koehl and Delarue 1994, Reva et al 1997. In many cases, such as the work described by Bowie et al. (1991), the term “effective energy function” is interpreted rather broadly since it depends only on the “environment” of each atom. Although there has been considerable success in using such empirical potentials in fold recognition (Marchler-Bauer et al., 1997), particularly if no additions or deletions are allowed, they do not account for distortion of the covalent structure or steric crowding. In a recent comparison (Marti-Renom et al., 1998), for example, several empirical potentials failed to recognize a badly misfolded structure of potato carboxypeptidase inhibitor. Levitt and co-workers Park et al 1997, Park and Levitt 1996 have tested a number of empirical functions and found none of them to be completely satisfactory in distinguishing incorrect conformations from the native structure. Others focused on discrimination of structures in the vicinity of the native structure, rather than grossly misfolded structures Wang et al 1995, Williams et al 1992.
One reason for the recent emphasis on statistically based criteria is the widespread belief that “molecular mechanics” potential energy functions of the type used in simulations (e.g. Brooks et al., 1983) cannot distinguish between native and misfolded structures. A study by Novotny et al. (1984), which was a precursor to modern threading studies (Finkelstein, 1997), is most frequently cited to support this thesis. In that study two proteins with the same number of residues but different folds were considered and the sequence of one was “threaded” onto the fold of the other. This created two pairs of correct and incorrect folds. After side-chains were built onto the incorrect models, the energy calculated with the CHARMM potential was minimized to remove bad contacts and then compared to the energy of the correct structure. The conclusion of the study, which seems not to have been fully understood, was that the calculated energy after mild minimization to eliminate bad contacts (RMS shift of 0.5 to 0.9 Å) was a “reasonable energy” for a native protein of the size of the test system, i.e. from the absolute energy by itself it was not obvious that the misfolded structure was wrong. However, if comparisons of the energies of the native and misfolded structure were made, the calculations did, in fact, discriminate the native fold, in spite of many statements to the contrary (e.g. Eisenberg & McLachlan, 1986).
Since there seems to be little, or actually no evidence that “physics-based” energy functions (Moult, 1997) are “worse” than statistically based functions, it seems worthwhile to make a more extended test of a function of the former type. As is evident from equation (2), it is the effective energy which includes solvation, that should be used both in minimization of the structure and in calculation of its energy. One specific problem that can arise in vacuum calculations is that conformations with the polar groups inside and the non-polar groups on the surface (“reverse proteins”) are more stable because interactions between polar groups are strong and there is no desolvation free energy cost for their burial in the protein interior (Novotny et al., 1984). This could happen in non-polar environments, such as membranes, but it is opposite of what is observed in protein structures that are stable in aqueous solution; i.e. the polar groups tend to be on the surface and it is the non-polar groups that are buried.
Explicit modeling of the solvent, although very useful in molecular dynamics (MD) simulations (Brooks et al., 1988), is not suitable for protein structure prediction because of the large amount of computer time required to survey a range of conformations. For this purpose, one needs a function for the solvation free energy, ideally analytical with analytical derivatives, that can be calculated rapidly. Wesson & Eisenberg (1992), for example, combined empirical atomic solvation parameters with the polar hydrogen CHARMM 19 potential energy function Brooks et al 1983, Neria et al 1996 and performed dynamics simulations on melittin. Stouten et al. (1993) presented a model based on contacts rather than accessible surface areas, combined it with the GROMOS energy function (van Gunsteren & Berendsen, 1987), and performed simulations on BPTI. They found the model to be a significant improvement over vacuum simulations, although the observed RMSD from the crystal structure was somewhat larger than in explicit water simulations. Fraternali & van Gunsteren (1996) developed an empirical solvation potential adjusted so as to reproduce the experimental radius of gyration of proteins in MD simulations. Friesner and co-workers Humphreys et al 1995, Monge et al 1995 used the AMBER force-field with the Generalized Born model (Still et al., 1990) to evaluate protein structures and found that the resulting function sometimes showed non-native structures to have a lower effective energy than the native structure. Augspurger & Scheraga (1996) recently simplified the hydration shell model (Kang et al., 1987) by including only double overlap terms in the calculation of the hydration shell volumes, which leads to enhanced computational efficiency. They compared the solvation free energy calculated by this model to that obtained by the Poisson-Boltzman equation (Augspurger & Scheraga, 1997), but did not report extensive tests of the combined energy function. This is true for most physics-based effective energy functions presented to date. They have not been tested for their ability to discriminate native from non-native structures in the spirit of Novotny et al. (1984) or for their ability to give stable native states in room temperature molecular dynamics simulations. Recently, Vorobjec et al. (1998) performed a limited Novotny-type test on nine out of the 22 proteins of the EMBL set (see below) with a hybrid approach. They generated an ensemble of conformations starting with both the native and the misfolded conformation by molecular dynamics simulations in explicit solvent, and evaluated these conformations using a molecular mechanics energy function complemented by three solvation terms: one for the cavity formation free energy, one for the protein-solvent dispersion interactions, and one for electrostatic polarization, evaluated by continuum electrostatics methods. They found that the average effective energy was always lower for the native structure. A stricter test of the proposed solvation model would be to use it in the generation of the ensemble of conformations, as well as in their evaluation.
We have recently developed an effective energy function (EEF1; Lazaridis and Karplus 1997, Lazaridis and Karplus 1999) based on the polar hydrogen form of the CHARMM potential energy function Brooks et al 1983, Neria et al 1996 complemented by a theoretically based solvation free energy model. EEF1 has been shown to give stable native structures for a series of proteins when used for MD simulations at room temperature, reasonable energies for unfolded conformations, and unfolding pathways in agreement with explicit water simulations (Lazaridis & Karplus, 1997). This suggests that EEF1 may be sufficiently accurate to be used for distinguishing native from non-native states for a given sequence. Here, we report the results of “Novotny” pairwise threading tests for a series of proteins with this effective energy function; for comparison we do the same test for the vacuum CHARMM 19 potential energy function. The threading tests are performed on the set of native misfolded pairs created by Holm & Sander (1992), a subset of which was used by Vorobjec et al. (1998), as mentioned above. Since these tests work well, we extend the tests to the small protein CI2 threaded into the fold of eight proteins of similar size. Finally, we apply EEF1 to a large number of decoys for six proteins prepared by Park & Levitt (1996), and to a set of CASP1 homology models.
Section snippets
Results
Table 1 shows the results obtained for the Holm-Sander misfolded structures with the CHARMM 19 vacuum potential. The total energy, the van der Waals and electrostatic components, as well as the RMSD after dynamics, are reported. With respect to the energy minimized structures, we see that CHARMM discriminates the correct conformation in most cases; there are three exceptions: 1ppt (avian pancreatic polypeptide), 1sn3 (scorpion neurotoxin), and 3b5c (cytochrome b5). After MD simulations are
Discussion
Compared to statistical database potentials, physical effective energy functions have many advantages. First, they have a sound theoretical basis, whereas the theoretical basis of database potentials is still being debated. Secondly, because they are “complete” energy functions with analytic first derivatives, they can be used in energy minimization and dynamics for studying the energetics unfolded, misfolded, and partially folded states, in addition to the native state. This can be very
Methods
The details of the effective energy function are given elsewhere Lazaridis and Karplus 1997, Lazaridis and Karplus 1999. The solvent contribution to the effective energy (potential of mean force) uses the assumption that the solvation free energy of a macromolecule ΔGsolv is the sum of contributions from its constituent groups: where ΔGiref is the solvation free energy of group i in a reference compound, Vj is the volume of group j and f i(rij) is
Acknowledgements
This work was supported by a grant from the National Science Foundation. T.L. was a Burroughs Wellcome PMMB Postdoctoral fellow. We thank the people who created the decoys used in this work, as well as those who created and maintain the CARB and Stanford Web sites from where they were obtained. We also thank a referee for urging a test of EEF1 on a broader set of decoys than that used originally; it was as a result of that comment that the Park & Levitt and CASP1 decoy sets were added.
References (46)
- et al.
Structure-derived hydrophobic potential
J. Mol. Biol
(1992) Protein structurewhat is it possible to predict now?
Curr. Opin. Struct. Biol
(1997)- et al.
An efficient mean solvation force model for use in molecular dynamics simulations of proteins in aqueous solution
J. Mol. Biol
(1996) - et al.
Evaluation of protein models by atomic solvation preference
J. Mol. Biol
(1992) - et al.
Configurational entropy of native proteins
Biophys. J
(1987) Molecular dynamics of native protein II. Analysis and nature of motion
J. Mol. Biol
(1983)- et al.
Refolding of potato carboxypeptidase inhibitor by molecular dynamics simulations with disulphide bond constraints
J. Mol. Biol
(1998) - et al.
Computer modeling of protein foldingconformational and energetic analysis of reduced and detailed protein models
J. Mol. Biol
(1995) Comparison of database potential and molecular mechanics force fields
Curr. Opin. Struct. Biol
(1997)- et al.
An analysis of incorrectly folded protein models. Implications for structure predictions
J. Mol. Biol
(1984)
Energy functions that discriminate X-ray and near-native fold from well-constructed decoys
J. Mol. Biol
Factors affecting the ability of energy functions to discriminate correct from incorrect folds
J. Mol. Biol
Contribution of hydration to protein folding thermodynamics II. The entropy and Gibbs energy of hydration
J. Mol. Biol
An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction
J. Mol. Biol
An efficient, differentiable hydration potential of peptides and proteins
J. Comp. Chem
An assessment of the accuracy of the RRIGS hydration potentialcomparison to solutions of the Poisson-Boltzmann equation
J. Comp. Chem
A method to identify protein sequences that fold into a known three-dimensional structure
Science
CHARMMa program for macromolecular energy minimization and dynamics calculations
J. Comput. Chem
Proteinsa theoretical perspective of dynamics, structure, and thermodynamics
Advan. Chem. Phys
Spin glasses and the statistical mechanics of protein folding
Proc. Natl Acad. Sci. USA
Locally accessible conformations of proteinsmultiple molecular dynamics simulations of crambin
Protein Sci
Evaluation of atomic level mean force potentials via inverse folding and inverse refinement of protein structures
Protein Eng
Solvation energy in protein folding and binding
Nature
Cited by (0)
- 1
Edited by A. R. Fersht
- 2
Present address: T. Lazaridis, Department of Chemistry, City College of New York, Convent Ave & 138th St. New York, NY 10031, USA.