Review
You are lost without a map: Navigating the sea of protein structures

https://doi.org/10.1016/j.bbapap.2014.12.021Get rights and content

Highlights

  • Protein crystal structures are not photographs, they are models based on data.

  • All models require subjective interpretation of electron density maps.

  • Even excellent structures have ambiguous regions; often the most interesting parts.

  • Prudent structure users always look at the electron density.

Abstract

X-ray crystal structures propel biochemistry research like no other experimental method, since they answer many questions directly and inspire new hypotheses. Unfortunately, many users of crystallographic models mistake them for actual experimental data. Crystallographic models are interpretations, several steps removed from the experimental measurements, making it difficult for nonspecialists to assess the quality of the underlying data. Crystallographers mainly rely on “global” measures of data and model quality to build models. Robust validation procedures based on global measures now largely ensure that structures in the Protein Data Bank (PDB) are largely correct. However, global measures do not allow users of crystallographic models to judge the reliability of “local” features in a region of interest. Refinement of a model to fit into an electron density map requires interpretation of the data to produce a single “best” overall model. This process requires inclusion of most probable conformations in areas of poor density. Users who misunderstand this can be misled, especially in regions of the structure that are mobile, including active sites, surface residues, and especially ligands. This article aims to equip users of macromolecular models with tools to critically assess local model quality. Structure users should always check the agreement of the electron density map and the derived model in all areas of interest, even if the global statistics are good. We provide illustrated examples of interpreted electron density as a guide for those unaccustomed to viewing electron density.

Introduction

Advances in crystallization, data collection, and computers have made macromolecular crystal structures commonplace. Biochemists, medicinal chemists, chemical biologists and many others have come to rely on macromolecular structural data as never before, and it has become routine to read, write, and review manuscripts that contain crystal structures. Furthermore, advances in the field have made it possible for scientists with limited training in crystallography to determine protein structures. Thus, even scientists with no formal background in crystallography need to know how to critically evaluate these complex experiments. While it has been noted recently that poorly determined structures have a negative impact on the drug design community [2], the focus here is on how to avoid the improper use of well-determined structural models. The first step is to understand how crystallographic models are made.

Every atom in the repeating unit of a crystal (the unit cell) contributes to the intensity of every reflection in the diffraction pattern. The measured intensity for each diffraction spot is the result of scattering from the entire model. Particular data points cannot be associated with specific parts of a model. For example, there is no “metal spot” in data collected from a metalloprotein crystal; the metal contributes to the intensity of every reflection (see Box A for a description of the crystallographic experiment). While crystallographic statistics reported in structure papers provide numerical indications of the overall quality of the diffraction data (for an excellent review, see [3]), these do not report on how well-determined individual parts of a model are. The Protein Data Bank (PDB)1 has recently adopted a new structure report format that gives a graphical representation of how a given model compares with others in the PDB in terms of five statistical measures of model quality [4], [5], [6], [7]. These reports are based on the excellent work of numerous leaders in the field of X-ray structure determination [6]. As good as these reports are, they are focused on the global quality of the structure.

Even in the best cases, there are areas of the electron density map that are poorly defined (Fig. 1). Thus, even a crystal structure that is based on high quality diffraction data and was carefully and competently built and refined will have local areas of the model that are less reliable than the rest. Very often, these regions are on the surface of a protein, and for most users, will not be important in drawing conclusions about molecular structure and function. Of course, if one is interested in protein–protein interactions, these regions are relevant. One's interests determine which parts of the electron density map to inspect.

Regions of the electron density map that are poorly defined due to mobile, disordered sections of the polypeptide frequently have important functions. For example, an enzyme may adopt multiple conformations associated with substrate entry, catalysis, and product egress. In addition, no protein model is produced entirely objectively, since human judgment always plays a role. Recognizing where uncertainty and bias may intrude is an important skill for a structure user who wishes to extract meaningful biological or chemical conclusions from a structure model. To assess which parts of a model are strongly supported by the data and which are less so, one cannot rely on statistical indicators, but should instead examine the electron density maps in regions of functional interest. Fortunately, most journals now require authors to deposit structure factors (the processed experimental data with associated phase estimates) along with the atomic coordinates in the PDB, making it easy to generate maps. The only way for users of macromolecular structures to evaluate the quality of the electron density maps used to build a model is to actually look at them. To avoid basing important experiments on weak structural data, users of macromolecular models must judge which parts of a model are relevant and trustworthy. The information content of the model might not support every idea the structure user has about the molecule.

Section snippets

The coordinate file

Scientists who work with protein structures routinely download PDB coordinate files from the Protein Data Bank and view models in a graphics program such as COOT [8], [9], PyMOL [10], Chimera [11], or JMOL [12]. The coordinate file is actually a simple text file that can be inspected with any text editor (Box B). Model users are encouraged to inspect the file in this way, because the header of the PDB file contains important information regarding the protein sample, the experimental setup, and

The protein

Structural data consumers should remember that structural models are not as rigid as they might seem. The ability to “measure” interatomic distances in hundredths of Ångstroms from a model using a graphics program does not mean they are actually known to anywhere near that level of accuracy. A molecular dynamics movie of a protein in solution shows them to be incredibly dynamic, bouncing and vibrating crazily (and anisotropically). It is entropically “expensive” to immobilize a floppy molecule

Conclusions

Coordinate files downloaded from the PDB contain three dimensional models that are built to approximate electron density maps derived from crystallographic data. All areas of the map are not equally well-drawn, so structure users must be careful not to base their hypothesis on areas of the map that ancient mapmakers would have labeled “Here Be Dragons.” Nevertheless, all areas of the model appear at first glance to be equally sound when looking at the coordinate file in a graphical viewer. In

Funding sources

This work was supported by the Purdue University College of Agriculture and MB-22 from the Pacific Enzyme Science Trust (T.J.K.), K02 AI093675 from the National Institute for Allergy and Infectious Disease of the National Institutes of Health (A.L.L), and MCB7171573 from the National Science Foundation, Directorate of Biological Sciences (N.R.S.). Use of the Advanced Photon Source was supported by the U. S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Contract

Acknowledgments

We thank Drs. Aron Fenton, Andy Gulick, Joe Jez, Graham Moran, Jeramia Ory, Emily Scott, and Courtney Starks for comments on the manuscript and synchrotron beamline scientists across the country for their help and advice. TJK thanks Drs. Paul Harkins, Courtney Starks, and I. I. Mathews for encouraging an interest in crystallography. ALL thanks Drs. Marcia Newcomer, Paula Flicker and Amy Rosenzweig for crystallographic training. NRS thanks Drs. Judith Kelly, Karen Allen, and Michael McDonough

References (56)

  • G.J. Kleywegt

    Validation of protein crystal structures

    Acta Crystallogr. D Biol. Crystallogr.

    (2000)
  • I.J. Tickle

    Statistical quality indicators for electron-density maps

    Acta Crystallogr. D Biol. Crystallogr.

    (2012)
  • P. Emsley et al.

    Coot: model-building tools for molecular graphics

    Acta Crystallogr. Sect. D: Biol. Crystallogr.

    (2004)
  • P. Emsley et al.

    Features and development of Coot

    Acta Crystallogr. D Biol. Crystallogr.

    (2010)
  • A.M. Davis et al.

    Application and limitations of X-ray crystallographic data in structure-based ligand and drug design

    Angew. Chem.

    (2003)
  • E.F. Pettersen et al.

    UCSF Chimera—a visualization system for exploratory research and analysis

    J. Comput. Chem.

    (2004)
  • A. Herraez

    Biomolecules in the computer: Jmol to the rescue

    Biochem. Mol. Biol. Educ.

    (2006)
  • M.D. Winn et al.

    Use of TLS parameters to model anisotropic displacements in macromolecular refinement

    Acta Crystallogr. D Biol. Crystallogr.

    (2001)
  • J. Painter et al.

    A molecular viewer for the analysis of TLS rigid-body motion in macromolecules

    Acta Crystallogr. D Biol. Crystallogr.

    (2005)
  • G.J. Kleywegt et al.

    The Uppsala Electron-Density Server

    Acta Crystallogr. D Biol. Crystallogr.

    (2004)
  • T.C. Terwilliger et al.

    Iterative-build OMIT maps: map improvement by iterative model building and refinement without model bias

    Acta Crystallogr. D

    (2008)
  • A. Hodel et al.

    Model bias in macromolecular crystal-structures

    Acta Crystallogr. A

    (1992)
  • N. Collaborative Computational Project

    The CCP4 suite: programs for protein crystallography

    Acta Crystallogr. Sect. D: Biol. Crystallogr.

    (1994)
  • C.X. Weichenberger et al.

    Visualizing ligand molecules in Twilight electron density

    Acta Crystallogr. Sect. F: Struct. Biol. Cryst. Commun.

    (2013)
  • P.D. Adams et al.

    PHENIX: a comprehensive Python-based system for macromolecular structure solution

    Acta Crystallogr. Sect. D: Biol. Crystallogr.

    (2010)
  • R.E. Franklin et al.

    Molecular configuration in sodium thymonucleate. 1953

    Nature

    (2003)
  • J.D. Watson et al.

    A structure for deoxyribose nucleic acid. 1953

    Nature

    (2003)
  • M.H. Wilkins et al.

    Molecular structure of deoxypentose nucleic acids

    Nature

    (1953)
  • Cited by (21)

    • Using 3D Structural Information in Computational Design

      2017, Comprehensive Medicinal Chemistry III
    • Application of advanced X-ray methods in life sciences

      2017, Biochimica et Biophysica Acta - General Subjects
      Citation Excerpt :

      The automation and standardization of the operation of high throughput beamlines and data collection systems has also led to an/the implementation of remote access procedures for SR sources [37]. Developments in instrumentation and software have reached such a level that MX techniques became accessible to “non-experts” [38–40]. Major steps in development in X-ray optics that allowed easy wavelength tuning at beamlines and advances in computational analyses led to a wider use of multiwavelength anomalous dispersion (MAD) phasing techniques [41–44].

    View all citing articles on Scopus
    View full text