A new test set for validating predictions of protein-ligand interaction

Proteins. 2002 Dec 1;49(4):457-71. doi: 10.1002/prot.10232.

Abstract

We present a large test set of protein-ligand complexes for the purpose of validating algorithms that rely on the prediction of protein-ligand interactions. The set consists of 305 complexes with protonation states assigned by manual inspection. The following checks have been carried out to identify unsuitable entries in this set: (1) assessing the involvement of crystallographically related protein units in ligand binding; (2) identification of bad clashes between protein side chains and ligand; and (3) assessment of structural errors, and/or inconsistency of ligand placement with crystal structure electron density. In addition, the set has been pruned to assure diversity in terms of protein-ligand structures, and subsets are supplied for different protein-structure resolution ranges. A classification of the set by protein type is available. As an illustration, validation results are shown for GOLD and SuperStar. GOLD is a program that performs flexible protein-ligand docking, and SuperStar is used for the prediction of favorable interaction sites in proteins. The new CCDC/Astex test set is freely available to the scientific community (http://www.ccdc.cam.ac.uk).

MeSH terms

  • Algorithms
  • Binding Sites
  • Computational Biology / methods*
  • Crystallography, X-Ray
  • Databases, Protein
  • Internet
  • Ligands
  • Protein Binding
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / metabolism*
  • Reproducibility of Results
  • Research Design
  • Software* / standards
  • Solvents
  • Water / chemistry
  • Water / metabolism

Substances

  • Ligands
  • Proteins
  • Solvents
  • Water