Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics
ReviewYou are lost without a map: Navigating the sea of protein structures
Graphical abstract
Introduction
Advances in crystallization, data collection, and computers have made macromolecular crystal structures commonplace. Biochemists, medicinal chemists, chemical biologists and many others have come to rely on macromolecular structural data as never before, and it has become routine to read, write, and review manuscripts that contain crystal structures. Furthermore, advances in the field have made it possible for scientists with limited training in crystallography to determine protein structures. Thus, even scientists with no formal background in crystallography need to know how to critically evaluate these complex experiments. While it has been noted recently that poorly determined structures have a negative impact on the drug design community [2], the focus here is on how to avoid the improper use of well-determined structural models. The first step is to understand how crystallographic models are made.
Every atom in the repeating unit of a crystal (the unit cell) contributes to the intensity of every reflection in the diffraction pattern. The measured intensity for each diffraction spot is the result of scattering from the entire model. Particular data points cannot be associated with specific parts of a model. For example, there is no “metal spot” in data collected from a metalloprotein crystal; the metal contributes to the intensity of every reflection (see Box A for a description of the crystallographic experiment). While crystallographic statistics reported in structure papers provide numerical indications of the overall quality of the diffraction data (for an excellent review, see [3]), these do not report on how well-determined individual parts of a model are. The Protein Data Bank (PDB)1 has recently adopted a new structure report format that gives a graphical representation of how a given model compares with others in the PDB in terms of five statistical measures of model quality [4], [5], [6], [7]. These reports are based on the excellent work of numerous leaders in the field of X-ray structure determination [6]. As good as these reports are, they are focused on the global quality of the structure.
Even in the best cases, there are areas of the electron density map that are poorly defined (Fig. 1). Thus, even a crystal structure that is based on high quality diffraction data and was carefully and competently built and refined will have local areas of the model that are less reliable than the rest. Very often, these regions are on the surface of a protein, and for most users, will not be important in drawing conclusions about molecular structure and function. Of course, if one is interested in protein–protein interactions, these regions are relevant. One's interests determine which parts of the electron density map to inspect.
Regions of the electron density map that are poorly defined due to mobile, disordered sections of the polypeptide frequently have important functions. For example, an enzyme may adopt multiple conformations associated with substrate entry, catalysis, and product egress. In addition, no protein model is produced entirely objectively, since human judgment always plays a role. Recognizing where uncertainty and bias may intrude is an important skill for a structure user who wishes to extract meaningful biological or chemical conclusions from a structure model. To assess which parts of a model are strongly supported by the data and which are less so, one cannot rely on statistical indicators, but should instead examine the electron density maps in regions of functional interest. Fortunately, most journals now require authors to deposit structure factors (the processed experimental data with associated phase estimates) along with the atomic coordinates in the PDB, making it easy to generate maps. The only way for users of macromolecular structures to evaluate the quality of the electron density maps used to build a model is to actually look at them. To avoid basing important experiments on weak structural data, users of macromolecular models must judge which parts of a model are relevant and trustworthy. The information content of the model might not support every idea the structure user has about the molecule.
Section snippets
The coordinate file
Scientists who work with protein structures routinely download PDB coordinate files from the Protein Data Bank and view models in a graphics program such as COOT [8], [9], PyMOL [10], Chimera [11], or JMOL [12]. The coordinate file is actually a simple text file that can be inspected with any text editor (Box B). Model users are encouraged to inspect the file in this way, because the header of the PDB file contains important information regarding the protein sample, the experimental setup, and
The protein
Structural data consumers should remember that structural models are not as rigid as they might seem. The ability to “measure” interatomic distances in hundredths of Ångstroms from a model using a graphics program does not mean they are actually known to anywhere near that level of accuracy. A molecular dynamics movie of a protein in solution shows them to be incredibly dynamic, bouncing and vibrating crazily (and anisotropically). It is entropically “expensive” to immobilize a floppy molecule
Conclusions
Coordinate files downloaded from the PDB contain three dimensional models that are built to approximate electron density maps derived from crystallographic data. All areas of the map are not equally well-drawn, so structure users must be careful not to base their hypothesis on areas of the map that ancient mapmakers would have labeled “Here Be Dragons.” Nevertheless, all areas of the model appear at first glance to be equally sound when looking at the coordinate file in a graphical viewer. In
Funding sources
This work was supported by the Purdue University College of Agriculture and MB-22 from the Pacific Enzyme Science Trust (T.J.K.), K02 AI093675 from the National Institute for Allergy and Infectious Disease of the National Institutes of Health (A.L.L), and MCB7171573 from the National Science Foundation, Directorate of Biological Sciences (N.R.S.). Use of the Advanced Photon Source was supported by the U. S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Contract
Acknowledgments
We thank Drs. Aron Fenton, Andy Gulick, Joe Jez, Graham Moran, Jeramia Ory, Emily Scott, and Courtney Starks for comments on the manuscript and synchrotron beamline scientists across the country for their help and advice. TJK thanks Drs. Paul Harkins, Courtney Starks, and I. I. Mathews for encouraging an interest in crystallography. ALL thanks Drs. Marcia Newcomer, Paula Flicker and Amy Rosenzweig for crystallographic training. NRS thanks Drs. Judith Kelly, Karen Allen, and Michael McDonough
References (56)
- et al.
A new generation of crystallographic validation tools for the protein data bank
Structure
(2011) - et al.
Study of protein dynamics by X-ray diffraction
Methods Enzymol.
(1986) On the relationship between diffraction patterns and motions in macromolecular crystals
Structure
(2009)- et al.
Heterogeneity and inaccuracy in protein structures solved by X-ray crystallography
Structure
(2004) - et al.
Limitations and lessons in the use of X-ray structural information in drug design
Drug Discov. Today
(2008) - et al.
Structures of Clostridium botulinum Neurotoxin Serotype A Light Chain complexed with small-molecule inhibitors highlight active-site flexibility
Chem. Biol.
(2007) - et al.
Announcing the worldwide Protein Data Bank
Nat. Struct. Biol.
(2003) - et al.
Avoidable errors in deposited macromolecular structures: an impediment to efficient data mining
IUCrJ
(2014) - et al.
Protein crystallography for non-crystallographers, or how to get the best (but not more) from published macromolecular structures
FEBS J.
(2008) - et al.
MolProbity: all-atom structure validation for macromolecular crystallography
Acta Crystallogr. Sect. D: Biol. Crystallogr.
(2010)
Validation of protein crystal structures
Acta Crystallogr. D Biol. Crystallogr.
Statistical quality indicators for electron-density maps
Acta Crystallogr. D Biol. Crystallogr.
Coot: model-building tools for molecular graphics
Acta Crystallogr. Sect. D: Biol. Crystallogr.
Features and development of Coot
Acta Crystallogr. D Biol. Crystallogr.
Application and limitations of X-ray crystallographic data in structure-based ligand and drug design
Angew. Chem.
UCSF Chimera—a visualization system for exploratory research and analysis
J. Comput. Chem.
Biomolecules in the computer: Jmol to the rescue
Biochem. Mol. Biol. Educ.
Use of TLS parameters to model anisotropic displacements in macromolecular refinement
Acta Crystallogr. D Biol. Crystallogr.
A molecular viewer for the analysis of TLS rigid-body motion in macromolecules
Acta Crystallogr. D Biol. Crystallogr.
The Uppsala Electron-Density Server
Acta Crystallogr. D Biol. Crystallogr.
Iterative-build OMIT maps: map improvement by iterative model building and refinement without model bias
Acta Crystallogr. D
Model bias in macromolecular crystal-structures
Acta Crystallogr. A
The CCP4 suite: programs for protein crystallography
Acta Crystallogr. Sect. D: Biol. Crystallogr.
Visualizing ligand molecules in Twilight electron density
Acta Crystallogr. Sect. F: Struct. Biol. Cryst. Commun.
PHENIX: a comprehensive Python-based system for macromolecular structure solution
Acta Crystallogr. Sect. D: Biol. Crystallogr.
Molecular configuration in sodium thymonucleate. 1953
Nature
A structure for deoxyribose nucleic acid. 1953
Nature
Molecular structure of deoxypentose nucleic acids
Nature
Cited by (21)
Using 3D Structural Information in Computational Design
2017, Comprehensive Medicinal Chemistry IIIApplication of advanced X-ray methods in life sciences
2017, Biochimica et Biophysica Acta - General SubjectsCitation Excerpt :The automation and standardization of the operation of high throughput beamlines and data collection systems has also led to an/the implementation of remote access procedures for SR sources [37]. Developments in instrumentation and software have reached such a level that MX techniques became accessible to “non-experts” [38–40]. Major steps in development in X-ray optics that allowed easy wavelength tuning at beamlines and advances in computational analyses led to a wider use of multiwavelength anomalous dispersion (MAD) phasing techniques [41–44].
Development of a structure-analysis pipeline using multiple-solvent crystal structures of barrier-to-autointegration factor
2020, Acta Crystallographica Section D: Structural BiologyCrystallography and chemistry should always go together: A cautionary tale of protein complexes with cisplatin and carboplatin
2015, Acta Crystallographica Section D: Biological Crystallography