Predicting Pol II promoter sequences using transcription factor binding sites

J Mol Biol. 1995 Jun 23;249(5):923-32. doi: 10.1006/jmbi.1995.0349.

Abstract

A computer program, PROMOTER SCAN, has been developed to recognize a high percentage of Pol II promoter sequences while allowing only a small rate of false positives. A total of 167 primate Pol II promoter sequences, obtained from the Eukaryotic Promoter Database, and 999 primate non-promoter sequences, obtained from the GenBank sequence databank, were used in the analysis. Both promoter and non-promoter sequences were analyzed for the comparative density of each unique mammalian transcription factor binding site listed in the Ghosh Transcription Factor Database. The density of each of these binding sites was then used to derive a ratio of density of each transcriptional element in promoter compared to non-promoter sequences. The combined individual density ratios of all binding sites were then collectively used to build a scoring profile called the Promoter Recognition Profile. This profile, used in combination with a weighted matrix for scoring a TATA box, was then used by the PROMOTER SCAN program to test the prediction of promoter sequences and the ability of the computer program to discriminate them from non-promoter sequences. When the promoter cutoff score was set so that 70% of promoters were recognized correctly by the program, a false positive rate of about 1/5600 bases was observed in the non-promoter sequence set. PROMOTER SCAN is now being developed for public distribution.

MeSH terms

  • Algorithms
  • Animals
  • Base Sequence
  • Binding Sites
  • Databases, Factual
  • Humans
  • Molecular Sequence Data
  • Promoter Regions, Genetic*
  • RNA Polymerase II / metabolism*
  • Software
  • TATA Box
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors
  • RNA Polymerase II