Protein Structure Prediction Using Rosetta

https://doi.org/10.1016/S0076-6879(04)83004-0Get rights and content

Publisher Summary

This chapter elaborates protein structure prediction using Rosetta. Double-blind assessments of protein structure prediction methods have indicated that the Rosetta algorithm is perhaps the most successful current method for de novo protein structure prediction. In the Rosetta method, short fragments of known proteins are assembled by a Monte Carlo strategy to yield native-like protein conformations. Using only sequence information, successful Rosetta predictions yield models with typical accuracies of 3–6 A˚ Cα root mean square deviation (RMSD) from the experimentally determined structures for contiguous segments of 60 or more residues. For each structure prediction, many short simulations starting from different random seeds are carried out to generate an ensemble of decoy structures that have both favorable local interactions and protein-like global properties. This set is then clustered by structural similarity to identify the broadest free energy minima. The effectiveness of conformation modification operators for energy function optimization is also described in this chapter.

Introduction

Double-blind assessments of protein structure prediction methods, held biannually in the community-wide critical assessment of structure prediction (CASP) experiments, have documented significant progress in the field of protein structure prediction and have indicated that the Rosetta algorithm is perhaps the most successful current method for de novo protein structure prediction.1, 2, 3 In the Rosetta method, short fragments of known proteins are assembled by a Monte Carlo strategy to yield native-like protein conformations. Using only sequence information, successful Rosetta predictions yield models with typical accuracies of 3–6 Å Cα root mean square deviation (RMSD) from the experimentally determined structures for contiguous segments of 60 or more residues. In such low- to moderate-accuracy models of protein structure, the global topology is correctly predicted, the architecture of secondary structure elements is generally correct, and functional residues are frequently clustered to an active site region. Models obtained by de novo prediction methods have been demonstrated to have utility for obtaining biological insight, either through functional site recognition or functional annotation by fold identification.4, 5 The Rosetta method is sufficiently fast to make genome-scale analysis possible: a recent study predicted structures for ≈500 PfamA families with no link to known structure.6 On the basis of previous performance, one of the five models reported for each Pfam family is expected to be a reasonable match to the true structure for about 50–60% of the families, and many of these predictions suggest a homology unapparent in their sequences

Because of its success in de novo structure prediction, the Rosetta method has also been successfully extended to other protein-modeling problems including structure determination using limited experimental constraints,7, 8de novo protein design,9, 10 protein–protein docking,11 and loop modeling.12 Structure determination by using Rosetta in combination with limited experimental constraints generally yields structures of higher overall accuracy, often with an RMSD of 2–3 Å over the entire protein. Loop modeling is carried out in the context of a homology-based template that is also frequently only ≈2 Å from the true structure. For design of novel protein structures, sequence selection algorithms require backbone structures of accuracy equivalent to experimentally determined X-ray crystal structures. To address these problems, as well as to refine de novo models, improvements to the Rosetta method have focused on increased detail in the potential functions and finer control of chain motion in the search algorithm.

Although de novo structure prediction with the Rosetta algorithm has been previously described, here we summarize the current method in its entirety. The benefits and limitations of the fragment assembly strategy utilized by Rosetta are discussed, and we describe adaptations of the Rosetta method for structural modeling with finer resolution. Enhancements to the fragment assembly strategy that allow more local modifications of protein conformation are described, and the effectiveness of these operators for energy function minimization is illustrated. In addition, in Appendix I we derive a new, efficient approach to screening local moves; that is, finding short sets of torsional angle changes that permit local changes in a protein chain while collectively minimizing global changes. Our formulation is computationally fast while offering better correlation to global distance changes appropriate to the atomic interaction potentials than previous popular methods (e.g., Gunn12a). The method is applicable to both the problem of screening discrete moves as well as allowing gradient descent of continuous multiangle moves.

Section snippets

Rosetta Strategy

A guiding principle of the Rosetta algorithm is to attempt to mimic the interplay of local and global interactions in determining protein structure. The method is based on the experimental observation that local sequence preferences bias but do not uniquely define the local structure of a protein. The final native conformation is obtained when these fluctuating local structures come together to yield a compact conformation with favorable nonlocal interactions, such as buried hydrophobic

Fragment Selection

The basic conformation modification operation employed by Rosetta is termed a “fragment insertion.” For each fragment insertion, a consecutive window of three or nine residues is selected, and the torsion angles of these residues are replaced with the torsion angles obtained from a fragment of a protein of known structure. For each query sequence to be predicted, a customized library of fragments defining the conformational space to be searched is selected by comparison of short windows of the

Structure Prediction by Fragment Assembly

The fragment assembly approach has multiple benefits for de novo protein structure prediction. First, and foremost, the fragment library approximates Gibbs sampling of the populated regions of the local potential energy surface of the backbone. The Rosetta philosophy is that during the folding process of real proteins, the local structure fluctuates between alternative local conformations and each fragment is a likely conformation of the local sequence. The use of a preset library of low-energy

Enhancements of Fragment Insertion Strategy

For de novo fold prediction, the benefits of fragment insertion allow rapid convergence on collapsed structures of plausible topology. Once this initial collapse has occurred, however, the fragment insertion strategy hinders efficient model refinement. Within a compact structure, any randomly selected, rigid body transformation of part of the chain is likely to create a clash with neighboring atoms or break favorable contacts. In addition, once the structure is coarsely established, the scale

Effectiveness of Conformation Modification Operators for Energy Function Optimization

Modified fragment insertions of the gunn type have been incorporated into the de novo prediction protocol, as described above, and permit significant optimization of the scoring function that is often accompanied by improvements in decoy accuracy and⧸or discrimination of near-native decoys.31 When the Rosetta strategy is combined with structural constraints, experimentally determined by nuclear magnetic resonance (NMR), the incorporation of the modified moves described here is essential for

Conclusions

Although any protein-modeling strategy must attempt to find an optimal tradeoff between cost of computation of each move and the effectiveness of modifications in optimizing a cost function, the optimal tradeoff is specific to the particular problem of interest. The random selection of fragment insertions without consideration of gradient information or likelihood of the modification being accepted allows fragment insertion to be an extremely rapid operation and is well suited for de novo

Supplemental Materials

Licensing information for Rosetta may be obtained by e-mail ([email protected], [email protected], [email protected]). In addition, automated Rosetta predictions can be obtained from the Rosetta server32 at http:⧸⧸robetta.bakerlab.org. The Rosetta server uses a combination of de novo prediction and homology modeling to produce complete three-dimensional models for proteins. Rosetta fragment libraries can be obtained from the automated server

References (33)

  • R. Bonneau et al.

    J. Struct. Biol.

    (2001)
  • J.A. Di Gennaro et al.

    J. Struct. Biol.

    (2001)
  • R. Bonneau et al.

    J. Mol. Biol.

    (2002)
  • B. Kuhlman et al.

    J. Mol. Biol.

    (2002)
  • J.J. Gray et al.

    J. Mol. Biol.

    (2003)
  • K.T. Simons et al.

    J. Mol. Biol.

    (1997)
  • T. Kortemme et al.

    J. Mol. Biol.

    (2003)
  • D.T. Jones

    J. Mol. Biol.

    (1999)
  • K.T. Simons et al.

    J. Mol. Biol.

    (2001)
  • R. Bonneau et al.

    Proteins Struct. Funct. Genet

    (2001)
  • P. Bradley et al.

    Proteins Struct. Funct. Genet

    (2003)
  • A.M. Lesk et al.

    Proteins Struct. Funct. Genet

    (2001)
  • P.M. Bowers et al.

    J. Biomol. NMR

    (2000)
  • C.A. Rohl et al.

    J. Am. Chem. Soc.

    (2002)
  • B. Kuhlman et al.

    Science

    (2003)
  • C.A. Rohl et al.

    Proteins Struct. Funct. Genet

    (2003)
  • Cited by (1398)

    • Bonds and bytes: The odyssey of structural biology

      2024, Current Opinion in Structural Biology
    • Coarse-grained potential for hydrogen bond interactions

      2023, Journal of Molecular Graphics and Modelling
    View all citing articles on Scopus
    View full text