Predicting Pol II promoter sequences using transcription factor binding sites

D S Prestridge

doi:10.1006/jmbi.1995.0349

Predicting Pol II promoter sequences using transcription factor binding sites

J Mol Biol. 1995 Jun 23;249(5):923-32. doi: 10.1006/jmbi.1995.0349.

Author

D S Prestridge¹

Affiliation

¹ Molecular Biology Computing Center, University of Minnesota, St Paul 55108, USA.

PMID: 7791218
DOI: 10.1006/jmbi.1995.0349

Abstract

A computer program, PROMOTER SCAN, has been developed to recognize a high percentage of Pol II promoter sequences while allowing only a small rate of false positives. A total of 167 primate Pol II promoter sequences, obtained from the Eukaryotic Promoter Database, and 999 primate non-promoter sequences, obtained from the GenBank sequence databank, were used in the analysis. Both promoter and non-promoter sequences were analyzed for the comparative density of each unique mammalian transcription factor binding site listed in the Ghosh Transcription Factor Database. The density of each of these binding sites was then used to derive a ratio of density of each transcriptional element in promoter compared to non-promoter sequences. The combined individual density ratios of all binding sites were then collectively used to build a scoring profile called the Promoter Recognition Profile. This profile, used in combination with a weighted matrix for scoring a TATA box, was then used by the PROMOTER SCAN program to test the prediction of promoter sequences and the ability of the computer program to discriminate them from non-promoter sequences. When the promoter cutoff score was set so that 70% of promoters were recognized correctly by the program, a false positive rate of about 1/5600 bases was observed in the non-promoter sequence set. PROMOTER SCAN is now being developed for public distribution.

MeSH terms

Algorithms
Animals
Base Sequence
Binding Sites
Databases, Factual
Humans
Molecular Sequence Data
Promoter Regions, Genetic*
RNA Polymerase II / metabolism*
Software
TATA Box
Transcription Factors / metabolism*

Substances

Transcription Factors
RNA Polymerase II