Conformer generation with OMEGA: learning from the data set and the analysis of failures

J Chem Inf Model. 2012 Nov 26;52(11):2919-36. doi: 10.1021/ci300314k. Epub 2012 Nov 12.

Abstract

We recently published a high quality validation set for testing conformer generators, consisting of structures from both the PDB and the CSD (Hawkins, P. C. D. et al. J. Chem. Inf. Model. 2010, 50, 572.), and tested the performance of our conformer generator, OMEGA, on these sets. In the present publication, we focus on understanding the suitability of those data sets for validation and identifying and learning from OMEGA's failures. We compare, for the first time we are aware of, the coverage of the applicable property spaces between the validation data sets we used and the parent compound sets to determine if our data sets adequately sample these property spaces. We also introduce the concept of torsion fingerprinting and compare this method of dissimilation to the more traditional graph-centric diversification methods we used in our previous publication. To improve our ability to programmatically identify cases where the crystallographic conformation is not well reproduced computationally, we introduce a new metric to compare conformations, RMSTanimoto. This new metric is used alongside those from our previous publication to efficiently identify reproduction failures. We find RMSTanimoto to be particularly effective in identifying failures for the smallest molecules in our data sets. Analysis of the nature of these failures, particularly those for the CSD, sheds further light on the issue of strain in crystallographic structures. Some of the residual failure cases not resolved by simple changes in OMEGA's defaults present significant challenges to conformer generation engines like OMEGA and are a source of new avenues to further improve their performance, while others illustrate the pitfalls of validating against crystallographic ligand conformations, particularly those from the PDB.

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Crystallography, X-Ray
  • Databases, Chemical
  • Ligands
  • Models, Molecular*
  • Molecular Conformation
  • Proteins / chemistry*
  • Software*

Substances

  • Ligands
  • Proteins