Protein Structure Prediction Using Rosetta
Introduction
Double-blind assessments of protein structure prediction methods, held biannually in the community-wide critical assessment of structure prediction (CASP) experiments, have documented significant progress in the field of protein structure prediction and have indicated that the Rosetta algorithm is perhaps the most successful current method for de novo protein structure prediction.1, 2, 3 In the Rosetta method, short fragments of known proteins are assembled by a Monte Carlo strategy to yield native-like protein conformations. Using only sequence information, successful Rosetta predictions yield models with typical accuracies of 3–6 Å Cα root mean square deviation (RMSD) from the experimentally determined structures for contiguous segments of 60 or more residues. In such low- to moderate-accuracy models of protein structure, the global topology is correctly predicted, the architecture of secondary structure elements is generally correct, and functional residues are frequently clustered to an active site region. Models obtained by de novo prediction methods have been demonstrated to have utility for obtaining biological insight, either through functional site recognition or functional annotation by fold identification.4, 5 The Rosetta method is sufficiently fast to make genome-scale analysis possible: a recent study predicted structures for ≈500 PfamA families with no link to known structure.6 On the basis of previous performance, one of the five models reported for each Pfam family is expected to be a reasonable match to the true structure for about 50–60% of the families, and many of these predictions suggest a homology unapparent in their sequences
Because of its success in de novo structure prediction, the Rosetta method has also been successfully extended to other protein-modeling problems including structure determination using limited experimental constraints,7, 8de novo protein design,9, 10 protein–protein docking,11 and loop modeling.12 Structure determination by using Rosetta in combination with limited experimental constraints generally yields structures of higher overall accuracy, often with an RMSD of 2–3 Å over the entire protein. Loop modeling is carried out in the context of a homology-based template that is also frequently only ≈2 Å from the true structure. For design of novel protein structures, sequence selection algorithms require backbone structures of accuracy equivalent to experimentally determined X-ray crystal structures. To address these problems, as well as to refine de novo models, improvements to the Rosetta method have focused on increased detail in the potential functions and finer control of chain motion in the search algorithm.
Although de novo structure prediction with the Rosetta algorithm has been previously described, here we summarize the current method in its entirety. The benefits and limitations of the fragment assembly strategy utilized by Rosetta are discussed, and we describe adaptations of the Rosetta method for structural modeling with finer resolution. Enhancements to the fragment assembly strategy that allow more local modifications of protein conformation are described, and the effectiveness of these operators for energy function minimization is illustrated. In addition, in Appendix I we derive a new, efficient approach to screening local moves; that is, finding short sets of torsional angle changes that permit local changes in a protein chain while collectively minimizing global changes. Our formulation is computationally fast while offering better correlation to global distance changes appropriate to the atomic interaction potentials than previous popular methods (e.g., Gunn12a). The method is applicable to both the problem of screening discrete moves as well as allowing gradient descent of continuous multiangle moves.
Section snippets
Rosetta Strategy
A guiding principle of the Rosetta algorithm is to attempt to mimic the interplay of local and global interactions in determining protein structure. The method is based on the experimental observation that local sequence preferences bias but do not uniquely define the local structure of a protein. The final native conformation is obtained when these fluctuating local structures come together to yield a compact conformation with favorable nonlocal interactions, such as buried hydrophobic
Fragment Selection
The basic conformation modification operation employed by Rosetta is termed a “fragment insertion.” For each fragment insertion, a consecutive window of three or nine residues is selected, and the torsion angles of these residues are replaced with the torsion angles obtained from a fragment of a protein of known structure. For each query sequence to be predicted, a customized library of fragments defining the conformational space to be searched is selected by comparison of short windows of the
Structure Prediction by Fragment Assembly
The fragment assembly approach has multiple benefits for de novo protein structure prediction. First, and foremost, the fragment library approximates Gibbs sampling of the populated regions of the local potential energy surface of the backbone. The Rosetta philosophy is that during the folding process of real proteins, the local structure fluctuates between alternative local conformations and each fragment is a likely conformation of the local sequence. The use of a preset library of low-energy
Enhancements of Fragment Insertion Strategy
For de novo fold prediction, the benefits of fragment insertion allow rapid convergence on collapsed structures of plausible topology. Once this initial collapse has occurred, however, the fragment insertion strategy hinders efficient model refinement. Within a compact structure, any randomly selected, rigid body transformation of part of the chain is likely to create a clash with neighboring atoms or break favorable contacts. In addition, once the structure is coarsely established, the scale
Effectiveness of Conformation Modification Operators for Energy Function Optimization
Modified fragment insertions of the gunn type have been incorporated into the de novo prediction protocol, as described above, and permit significant optimization of the scoring function that is often accompanied by improvements in decoy accuracy and⧸or discrimination of near-native decoys.31 When the Rosetta strategy is combined with structural constraints, experimentally determined by nuclear magnetic resonance (NMR), the incorporation of the modified moves described here is essential for
Conclusions
Although any protein-modeling strategy must attempt to find an optimal tradeoff between cost of computation of each move and the effectiveness of modifications in optimizing a cost function, the optimal tradeoff is specific to the particular problem of interest. The random selection of fragment insertions without consideration of gradient information or likelihood of the modification being accepted allows fragment insertion to be an extremely rapid operation and is well suited for de novo
Supplemental Materials
Licensing information for Rosetta may be obtained by e-mail ([email protected], [email protected], [email protected]). In addition, automated Rosetta predictions can be obtained from the Rosetta server32 at http:⧸⧸robetta.bakerlab.org. The Rosetta server uses a combination of de novo prediction and homology modeling to produce complete three-dimensional models for proteins. Rosetta fragment libraries can be obtained from the automated server
References (33)
- et al.
J. Struct. Biol.
(2001) - et al.
J. Struct. Biol.
(2001) - et al.
J. Mol. Biol.
(2002) - et al.
J. Mol. Biol.
(2002) - et al.
J. Mol. Biol.
(2003) - et al.
J. Mol. Biol.
(1997) - et al.
J. Mol. Biol.
(2003) J. Mol. Biol.
(1999)- et al.
J. Mol. Biol.
(2001) - et al.
Proteins Struct. Funct. Genet
(2001)
Proteins Struct. Funct. Genet
Proteins Struct. Funct. Genet
J. Biomol. NMR
J. Am. Chem. Soc.
Science
Proteins Struct. Funct. Genet
Cited by (1398)
Deep learning for advancing peptide drug development: Tools and methods in structure prediction and design
2024, European Journal of Medicinal ChemistryBonds and bytes: The odyssey of structural biology
2024, Current Opinion in Structural BiologyA novel algorithm based on a modified PSO to predict 3D structure for proteins in HP model using Transfer Learning
2024, Expert Systems with ApplicationsCoarse-grained potential for hydrogen bond interactions
2023, Journal of Molecular Graphics and ModellingProtein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms
2023, Genomics, Proteomics and Bioinformatics