Data mining of metal ion environments present in protein structures

https://doi.org/10.1016/j.jinorgbio.2008.05.006Get rights and content

Abstract

Analysis of metal–protein interaction distances, coordination numbers, B-factors (displacement parameters), and occupancies of metal-binding sites in protein structures determined by X-ray crystallography and deposited in the PDB shows many unusual values and unexpected correlations. By measuring the frequency of each amino acid in metal ion-binding sites, the positive or negative preferences of each residue for each type of cation were identified. Our approach may be used for fast identification of metal-binding structural motifs that cannot be identified on the basis of sequence similarity alone. The analysis compares data derived separately from high and medium-resolution structures from the PDB with those from very high-resolution small-molecule structures in the Cambridge Structural Database (CSD). For high-resolution protein structures, the distribution of metal–protein or metal–water interaction distances agrees quite well with data from CSD, but the distribution is unrealistically wide for medium (2.0–2.5 Å) resolution data. Our analysis of cation B-factors versus average B-factors of atoms in the cation environment reveals substantial numbers of structures contain either an incorrect metal ion assignment or an unusual coordination pattern. Correlation between data resolution and completeness of the metal coordination spheres is also found.

Introduction

Metal ions are frequently observed in protein structures, and are often crucial for protein function, stability, or both. Moreover, in many cases metal ions are critical for crystal formation as the ions mediate crystal contacts between proteins. In the release dated February 20, 2007 of the Protein Data Bank (PDB) [1], approximately 30% of structures contained metal ions. Among 23,537 structures of proteins complexed with one or more small molecular ligands; 20% contained one or more metal ions close to the ligand binding site that are likely to interact either directly or indirectly with the ligand. 10% of the structures have a direct cation–ligand contact and the other 10% have a cation–ligand interaction bridged by an amino acid or ordered water molecules. This detailed analysis of the metal coordination architecture within proteins represents an important addition to the understanding of the biochemical functions of metalloproteins.

The ratio of the number of observed data to the number of parameters used in structure refinement depends on the data resolution and the number of atoms in a crystallographic asymmetric unit. For macromolecular structures, this ratio is usually low, due to the limited resolution of the data used to determine such structures. Therefore, the use of model restraints is a nearly universally applied technique in model building and structure refinement processes [2]. In addition to the stereochemical restraints for the macromolecule itself [3], [4], it is essential to apply restraints to the metal ion-binding site (and subsequently interpret the electron density) taking into account the coordination properties of the cation. In all the most popular programs used for macromolecular structure refinement, the restraints for metal–ligand interactions must be manually defined by the user. While the stereochemistry of proteins and nucleic acids is well understood, there is no universal approach to describe the geometry of metal ion-binding sites. Alkaline earth cations such as calcium and magnesium are relatively easy to identify in electron density as the geometrical parameters (e.g. bond lengths and coordination number) of their binding sites are very well characterized [5], [6], [7], [8]. Alkali metal ions such as sodium and potassium, however, are more difficult to identify because their coordination spheres are not as regular as those of alkaline earth metal ions [9]. Transition metals have even more complex binding patterns as not only can their coordination numbers vary but they can have different oxidation states. The bond lengths for transition metals depend on their oxidation state and even within the same oxidation state, different bond lengths are observed due to known geometrical distortions of the coordination spheres, for example due to the Jahn–Teller effect [10] or different spin state.

Studies describing the geometry of metal ion-binding sites within proteins and in small-molecule structures were recently extensively discussed in a series of papers by Harding [5], [6], [7], [8], [9], [11]. Here, in contrast, our objective is to analyze the properties of metal ion-binding sites in protein structures as a function of structure resolution and crystallographic methodology. In particular, we report a relational database approach to statistically analyze metal ion sites in protein structures present in the PDB [1], and compare them to high-resolution small-molecule structures obtained from the Cambridge Structural Database (CSD) [12]. We not only examined the distributions of bond lengths and coordination numbers but also the B-factors (displacement parameter sometimes referred as ‘temperature factor’) and relative occupancies of metal ions versus their coordinating atoms were analyzed. The distributions were cross-correlated with the computer programs used for structure refinement. Our results show some abnormally high or low values of bond lengths and B-factors in metal-binding sites reported in the PDB. Despite many theoretical papers describing proper geometrical restraints for metal ion environments, our examination of recent structures indicates that those restraints are often not properly used in structure refinement.

Section snippets

Data set under investigation

This work is based on the PDB database release of February 20, 2007 (41,814 structures). All structures in PDB which contain one or more Ca, Mg, Na, K, Mn, Co, Fe, Zn, Ni, Cu cations are included in the statistical analysis unless otherwise specified. In the analyses of structure resolution, B-factor or occupancy, only metal ion-binding sites in protein structures solved by X-ray crystallography were included. For purposes of comparative analysis, the set was subdivided; structures with

Atom type and amino acid profiles of metal ion-binding sites

A distribution of normalized frequencies Fatom of atoms located within 3 Å from the metal ion is shown in Table 1. The same table generated with a cutoff of 4 Å gives similar, but somewhat noisier, results. The non-redundant subset of structures, containing around 30% data of the complete data set, gives very similar results to the complete data set shown in Table 1. The number of interactions listed in the last row of both Table 1a and b represents the number of pairs (in this case, a metal ion

Atoms and amino acids participating in metal ion-binding

All analyzed metal ions except Cu show a preference for interaction with a side chain carboxylate group (Table 1). Alkaline earth metal ions (Ca2+, Mg2+) exhibit the highest preference for coordination by side chain carboxylate groups followed by a weaker preference for interaction with oxygen atoms from side chain amide groups. Alkali metal ions (Na+, K+) are preferred approximately equally by all types of oxygen atoms. Metal ions from both the imidazole class (Mn, Co, Fe) and the sulfur class

Conclusion

Analysis of PDB structures that contain metal ions reveals that despite the several publications providing an excellent description of the geometry of metal ion environments, there are still many structures (even some solved very recently) that have quite unusual geometry. Often, the geometries of metal ion-binding sites were not properly restrained, most probably due to the lack of mechanisms to automatically generate such restraints in all of the commonly used refinement programs. We suggest

Acknowledgments

We would like to thank Zbigniew Dauter, Andrzej Joachimiak, and Matthew Zimmerman for critically reading the manuscript and making valuable comments. The work was supported by NIH Grants GM74942 and GM53163.

References (22)

  • L. Rulisek et al.

    J. Inorg. Biochem.

    (1998)
  • H.M. Berman et al.

    Nucleic Acids Res.

    (2000)
  • P.R. Evans

    Acta Crystallogr. D

    (2007)
  • R. Engh et al.

    Acta Crystallogr. A.

    (1991)
  • M. Jaskolski et al.

    Acta Crystallogr. D

    (2007)
  • M. Harding

    Acta Crystallogr. D

    (2001)
  • M. Harding

    Acta Crystallogr. D

    (1999)
  • M. Harding

    Acta Crystallogr. D

    (2000)
  • M. Harding

    Acta Crystallogr. D

    (2006)
  • M. Harding

    Acta Crystallogr. D

    (2002)
  • H. Jahn et al.

    Proc. Roy. Soc. London A

    (1937)
  • Cited by (0)

    View full text