MolPharm xPharm- The Comprehensive Pharmacology Reference

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


Molecular Pharmacology Fast Forward
First published on September 1, 2004; DOI: 10.1124/mol.104.001677


0026-895X/04/6606-1557-1572$20.00
Mol Pharmacol 66:1557-1572, 2004

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
mol.104.001677v1
66/6/1557    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kel, A.
Right arrow Articles by Borlak, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kel, A.
Right arrow Articles by Borlak, J.

A Novel Computational Approach for the Prediction of Networked Transcription Factors of Aryl Hydrocarbon-Receptor-Regulated Genes

Alexander Kel, Susanne Reymann, Volker Matys, Paul Nettesheim, Edgar Wingender, and Jürgen Borlak

BIOBASE GmbH, Wolfenbüttel, Germany (A.K., V.M., E.W.); Institute of Cytology and Genetics, Novosibirsk, Russia (A.K.); Fraunhofer Institute of Toxicology and Experimental Medicine, Center for Drug Research and Medical Biotechnology, Hannover, Germany (S.R., J.B.); National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina (P.N.); and Center of Pharmacology and Toxicology, Medical School of Hannover, Hannover, Germany (J.B.)

Received April 21, 2004; accepted August 26, 2004


    Abstract
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 References
 
A novel computational method based on a genetic algorithm was developed to study composite structure of promoters of coexpressed genes. Our method enabled an identification of combinations of multiple transcription factor binding sites regulating the concerted expression of genes. In this article, we study genes whose expression is regulated by a ligand-activated transcription factor, aryl hydrocarbon receptor (AhR), that mediates responses to a variety of toxins. AhR-mediated change in expression of AhR target genes was measured by oligonucleotide microarrays and by reverse transcription-polymerase chain reaction in human and rat hepatocytes. Promoters and long-distance regulatory regions (>10 kb) of AhR-responsive genes were analyzed by the genetic algorithm and a variety of other computational methods. Rules were established on the local oligonucleotide context in the flanks of the AhR binding sites, on the occurrence of clusters of AhR recognition elements, and on the presence in the promoters of specific combinations of multiple binding sites for the transcription factors cooperating in the AhR regulatory network. Our rules were applied to search for yet unknown Ah-receptor target genes. Experimental evidence is presented to demonstrate high fidelity of this novel in silico approach.


Regulation of gene expression is accomplished through binding of transcription factors (TFs) to distinct regions of DNA (TF binding sites), and, after anchoring at these sites, transmission of the regulatory signal to the basal transcription complex. Some of these TFs are specific for a particular tissue, a definite stage of development, or a given extracellular signal, but most transcription factors are involved in gene regulation under a rather wide spectrum of cellular conditions. It is clear by now that combinations of transcription factors and not single transcription factors drive gene transcription and define its specificity.

The aryl hydrocarbon receptor (AhR) is a well understood ligand-activated transcription factor that mediates responses to a variety of toxins. Among them are halogenated aromatic toxins such as 2,3,7,8-tetrachlorodibenzo-p-dioxin, polynuclear aromatic hydrocarbons, combustion products, and numerous phytochemicals such as flavonoids and indole-3-carbinol. The nuclear AhR complex is a heterodimer composed of the AhR and AhR nuclear translocator (Arnt) proteins. Binding of this complex to so-called dioxin responsive elements leads to activation of a large number of AhR-responsive genes, including phase 1 (e.g., CYP1A1) and phase 2 drug-metabolizing enzymes and also genes coding for cellular differentiation, metabolism, and apoptosis among others (Safe and Wormke, 2003Go).

In-depth studies of the last years revealed many details about the mechanisms of action of the AhR transcription factor and promoter activation of target genes, including CYP1A1 (Whitlock, 1999Go). Protein-DNA interactions of AhR were analyzed, and expression of target genes was studied. Despite our advanced understanding of AhR receptor biology, the precise mechanism of activation of target genes remains uncertain. There are a number of genes known to be regulated (activated or repressed) by AhR in combination with other transcription factors, including AP-1 and ER (Safe and Wormke, 2003Go), but the general combinatoric rules to enable robust prediction of gene expression are lacking. Furthermore, there is a need to identify networks of cooperating transcription factors. Thus, computational approaches for the prediction of the transcriptional network of AhR target genes are needed.

In the past few years, a number of computational approaches were developed addressing the problem of combinatorial regulation of transcription by networks of transcription factors. Specific TF binding site combinations were used for an identification of muscle-specific promoters (Frech et al., 1998Go; Wasserman and Fickett, 1998Go), promoters of liverenriched genes (Tronche et al., 1997Go), of yeast genes (Brazma et al.,1997Go), of immune-specific genes (Kel et al., 1993Go; Boehlk et al., 2000Go; Fessele et al., 2001Go), and promoters of genes regulated during cell cycle (Kel et al., 2003Go). In the database TRANSCompel (Kel-Margoulis et al., 2002aGo), known composite regulatory elements and specific combinations of pairs or triplets of TF binding sites that are located in proximity to each other were collected. This provides information on synergistic or antagonistic effects on gene regulation by different TFs. Nonetheless, information on TF networks is limited, and descriptions of such networks are often highly speculative. With the introduction of global gene expression arrays into the armory of biomedical research, investigators have been enabled to analyze the expression patterns of hundreds or thousands of genes altered in parallel in response to physiological as well as pathological stimuli. To gain insight into the mechanisms governing these changes, we need to acquire a better understanding of the regulatory proteins as well as the regulatory regions of the genes involved. To facilitate the expansion of this knowledge base, one suitable approach is to develop genetic algorithms for the identification and characterization of promoters and other regulatory elements. Including such genetic algorithms, we have developed a computational strategy that we used to analyze promoters of differentially expressed genes regulated by the AhR. Applying this strategy, we identified composite modules that include AhR binding sites arranged in clusters and accompanied by a number of colocalized specific DNA motifs, including hepatic nuclear factors, CCAAT/enhancer-binding proteins, and others. The revealed composite modules were used to identify novel gene targets based on whole genome searches.


    Materials and Methods
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 References
 
Tissue and Cultures. Isolation of human and rat hepatocytes was done as described previously (Borlak et al., 2002Go, 2003Go). Male Sprague-Dawley rats weighing 200 to 220 g were obtained from Charles River Laboratories (Sulzfeld, Germany). Anesthesia was done with ketamine/xylazine. After midline incision, the liver was perfused via the portal vein with 200 ml of Ca2+-free Krebs-Ringer buffer and subsequently with 100 ml containing Ca2+ and collagenase at 37°C. The liver capsule was gently removed, and liver tissue was washed as described above. Primary human and rat hepatocytes were cultured enclosed between two layers of collagen as described previously (Dunn et al., 1991Go).

RNA Isolation and Production of Copy RNA. Total RNA was isolated using RNeasy total RNA isolation kit (QIAGEN GmbH, Hilden, Germany) according to the manufacturer's recommendation. Ten micrograms of total RNA was used for the synthesis of double-stranded cDNA with Superscript II reverse transcriptase. Primer extension was with high-performance liquid chromatography-purified T7-(dT)24 (GenSet SA, Paris, France) as a primer. After clean up, the double-stranded cDNA was used for the synthesis of biotin-labeled cRNA (ENZO BioArray High Yield RNA Transcript Labeling kit; Affymetrix, Santa Clara, CA). cRNA was purified with RNeasy spin columns from QIAGEN and cleaved into fragments of 35 to 200 bases by metal-induced hydrolysis.

Microarray Experiments. This was done according to the manufacturer's recommendations as detailed in Singh-Gasson et al. (1999Go) with n = 3 repeats. Furthermore, we used n = 15 cDNA arrays containing 302 well known genes that code for detoxification, cell proliferation, tumor development, heat shock response, signal transduction, apoptosis, and cell cycle regulation (J. Borlak, J. Drewes, K. Hofmann, and A. Bosio, submitted).

Thermocycler RT-PCR. For PCR amplification of cDNA, a 25-µl reaction mixture was prepared containing 10x polymerase reaction buffer, 1.5 mM MgCl2, 0.4 nM dNTPs (Roche Diagnostics, Mannheim, Germany), 400 nM concentration of the 3'- and 5'-specific primers (synthesized by Invitrogen, Karlsruhe, Germany), 1 U of Taq-polymerase, and 1 µl of cDNA.

PCR reactions were carried out in a thermal cycler (T3; Biometra, Göttingen, Germany) using primer-specific melting, annealing, and extension cycling conditions. DNA contamination was checked for by direct amplification of RNA extracts before conversion of RNA to cDNA and was excluded. PCRs were done within the linear range of amplification. PCR products were separated using an agarose gel, stained with ethidium bromide, and photographed on a transilluminator. A semiquantitative measurement was performed using the program NIH Image version 1.62.

Sequence Retrieval. One hundred eighteen (68 human and 50 rat) differentially expressed genes were used as training sets (Table 1). For 41 human and 37 rat genes, the level of expression was determined experimentally.


View this table:
[in this window]
[in a new window]
 
TABLE 1 Expression levels of different genes after treatment of human or rat hepatocytes with Aroclor 1254, and the LocusLink ID (identifier) of these genes

 

The Ensembl database and UCSC Genome browser was used to extract the 5' regions of the AhR-regulated genes and to build the training sets of sequences. For the majority of the genes and based on the Ensembl annotation of the first exons, 5' regions could be retrieved. The beginning of the first exon was considered as a tentative transcription start site (TSS). However, for some genes, identification of TSS was difficult. In these cases, several possible 5' regions were considered. For some of genes given in Table 1, the annotation of promoters in genomes was not reliable (based on mapping of 5' incomplete expressed sequence tags or based on an in silico gene prediction). In such cases, we did not include these genes into the training sets. Finally, 96 5' regions of these genes were included into the training set (81 human and 16 rat). Four sets of subsequences were prepared: (-500/+100), (-1000/+100), (-2000/+2000), and (-20,000-gene-+20,000). We refer to these sequences as G600, G1100, G4000, and G20gene20kb sets.

Another training subset contained 5' upstream regions for 58 genes (42 human and 16 rat) for which gene expression was quantified (Table 1). We used the logarithms of the relative expression values for further studies of correlation between promoter structure and the relative expression values. Furthermore, several sequence sets were extracted from the human genome database and used as controls. One set contained 200 promoter sequences from human genes in chromosome 21 (PR). Another set contained far upstream sequences of human genes (based on RefSeq annotation) from chromosome 21 (from -50 to -40 kb) (UPG10kb set). Another set was used for an analysis of exon 3 sequences of human genes (EXON3 set). No functional TF binding sites are within exon 3; thus, these sequences can be used as a TF "site-free" background. A further sequence set containing randomly selected intergenic sequences of chromosome 22 (INTERG set) were also used and considered in our analysis.

Databases. Two databases for prediction of gene expression were used (BIOBASE GmbH, Wolfenbüttel, Germany). TRANSFAC is a database that collects information about gene regulation in eukaryotes based on binding of transcription factors to their target sites. We used TRANSFAC Professional release 6.4 and TRANSCompel (release 6.4), which contains known composite regulatory elements in mammalian genes.

Weight Matrix Method for Recognition of AhR Sites. The most widely used methods for recognition of transcription factor binding sites is the application of positional weight matrices (PWMs) (Quandt et al., 1995Go; Whitlock, 1999Go). TRANSFAC is the largest collection of weight matrices for eukaryotic transcription factors (Wingender et al., 2001Go) (http://www.biobase.de and http://www.gene-regulation.com). In this database, three different weight matrices for AhR sites were stored before this study (Accession nos. M00139, M00235, and M00237). In the current work, a novel weight matrix was constructed on the basis of 25 known, experimentally verified, genomic AhR-binding sites (TRANSFAC accession no. M00778; id V$AHR_Q5). All these matrices were used for searching potential AhR sites in genomic sequences. For this search, we used the MATCH algorithm calculating scores for the matches by applying the so-called information vector (Kel et al., 2003Go).

Software Tools for Searching TF Binding Sites. We used the TRANSPLORER software package (BIOBASE GmbH) to identify potential TF binding sites. This software uses a PWM collection and the most up-to-date library of matrices derived from the TRANSFAC Professional database. A TF binding site search can be made using all or a particular subset of matrices from this library. Such matrix subsets with defined cutoff values for every matrix are called "profiles" in TRANSPLORER.

We used a profile that included matrices for different transcription factors of vertebrate organisms (TRANSFAC release 6.4) known to be involved in gene regulation in liver, in immune cells, and in the cell cycle. The cutoff values for the matrices are set to a very low value to minimize the rate of false negative predictions.

Method for Revealing Short Sequence Motifs in the Flanking Regions of TF Sites (Local Context). To analyze the flanking regions of AhR sites, we applied an algorithm that was developed recently (Kel et al., 1999Go, 2001Go). The composition of over- and under-represented short oligonucleotides (local context) is investigated in the flanking regions of functional AhR sites (50 bp to the left and right of the 11-bp-long AhR sites). This local context is taken into account while searching genomes for potential AhR sites.

The applied algorithm is based on a comparison of two sets of sequences of equal length L: a training set Y consisting of the functional AhR sites, including their flanking regions; and a negative control set N consisting of the sequences from "nonpromoter" regions of the human genome. To compile the N set, we computed a MATCH search in the INTERG sets using the AhR matrix (accession no. M00778; score value q > qcutoff = 0.98, it gives approximately 1 match per 5 kb of genomic sequence, as it was tested on the chromosome 22). One hundred of randomly selected matches together with their flanks were placed into the control set N. Thus, set N consists of sequences that contain a motif fitting to the AhR matrix; however, because of its position in the genome most of these "sites" should not be functional. This way, by comparison of sets Y and N, we can reveal features that characterize the sequence environment (context) of functional AhR sites. As contextual features, we consider the frequency of occurrence of short motifs {lambda} = (a1a2..ak) (a {epsilon}{A,T,G,C, W,S,R,Y,M,K,B,V,H,D,N}1) of the length k ≤ 4 in a window w = [t1,t2] (0 < t2 < t1 < L - k + 1).

In our previous work (Kel et al., 1993Go, 2001Go), we described a statistical method that permits an identification of motifs {lambda} and the windows w that are characterized by a significant difference of their frequencies f({lambda},w,S) in the sequences S from the sets Y and N. The motifs found are used then for creating a context analyzer that is able to perform an additional filtering of the potential sites revealed by the weight matrix method.

The context analyzer is developed by means of linear discriminant analysis. The motifs selected at the previous step ({lambda}1, {lambda}2,..., {lambda}m) are used for construction of a linear classification function discriminating sets Y and N. So, for every sequence X, we calculate the score of context:

(1)

where {alpha}i and {beta} are the coefficients of the discriminating function.

A Genetic Algorithm to Determine Composite Regulatory Modules. We define a composite module CM as a set of TF weight matrices with given matrix cutoffs and other parameters that is associated with a specific functional type of gene regulatory regions. We have developed a new computational method to determine CMs in a set of promoter (or other regulatory) sequences of regulated genes. This method is based on a genetic algorithm (a prototype of this method is incorporated in the tool package ClusterScan (Kel-Margoulis et al., 2002bGo). The CMs are characterized by the following parameters: K, the number of PWMs in the module (typically 6-12), cutoff values q(k)cutoff, relative importance values {varphi}(k) and maximum number of best matches {kappa}(k) that are assigned to every weight matrix k (k = 1,K) in the CM. These K matrices are selected by the program from a library of all considered matrices. We use different profiles, including the profile vertebrate_minFN62.prf, which includes 410 matrices for different transcription factors of vertebrate organisms (TRANSFAC release 6.4). Some matrices are organized in pairs. Such pairs are designed to model composite elements that play a very important role providing synergistic or antagonistic regulation of transcription through binding of different transcription factors. (Kel et al., 1995Go). A parameter R is defined that puts a limit on a distance between matches of these matrix pairs (at least one pair of matches should be found fitting this limit). When all these parameter settings can be defined a "composite module score" (CM score) is given for any sequence X using the following equation:

(2)

where qi(k) (X) are the {kappa}(k) best scoring sites found in the sequence X by the matrix (k). Then the genetic algorithm is used to determine the specific parameters of CMs for a particular set of promoters. The general description of the genetic algorithm was reported in Kel et al. (2001Go).

We define the goal function G as a weighted sum of false negative and false positive errors and determine the statistical significance (t test) over several random iterations of bootstrap procedure by splitting the initial sequence sets into a training and a testing subset. In addition, we test the normal distribution of the F function over the set of positive and negative sequences. Such thorough way of calculating the goal function using bootstrap test and normality check allows to assess the usability of the obtained solutions for classification of individual sequences. Because of the small size of the sequence set, we decided to use the complete set for the training and apply the thorough analysis of distribution rather then to sacrifice a part of the set for independent testing. Nonetheless, the final biological and truly independent test of the method is done by the complete genome screening and by comparing results with microarray data (see Results).

The program CMFinder (cmf) takes as an input two sets of sequences (the set which is analyzed and a background set) and a set of weight matrices for transcription factors. For defined parameters K and R and over a number of iterations, the program optimizes the set of matrices selected, their number, their cutoff values, the relative importance, and the maximum number of best matches. The user can vary parameters K and R and compare results of the program. The output of the program is a profile ready to run by MATCH or TRANSPLORER.

Method to Determine CMs Correlated with Gene Expression. The method is based on a modification of the genetic algorithm described in the previous section. The program CMFinder now searches for a combination of TF matrices that minimizes a goal function G that is a weighted sum of the least squares deviation between logarithms of gene expression values and the score function FCM described in the previous section and a rank correlation coefficient:

(3)

here, LR = log2(ex1/ex2), which is the log-ratio of the expression values of a particular gene in the experiment versus control (i.e., a signal log-ratio).


    Results
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 References
 
Construction of a Novel AhR Weight Matrix. We initially analyzed experimental findings on AhR bindings sites in promoters of targeted genes. From the 35 sites reported in 17 genes (human, mouse, and rat), 25 independent sites were selected after exclusion of close homologs (Table 1). A novel algorithm was used to identify an optimal length of the matrix (Kel et al., 2003Go), which comprised 11 nucleotides. This matrix was stored in TRANSFAC (accession no. M00778) and used for further analysis. Figure 1 depicts the constructed AhR weight matrix and the corresponding consensus sequence. Cut-off qcutoff = 0.966 is the maximal cutoff value that allows recognition of all 25 sites of the training set. This cutoff value was used in further analysis.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 1. AhR weight matrix (TRANSFAC accession no. M00778; id V$AHR_Q5). In each of the 11 positions, the weights for the four nucleotides are given.

 

The core of consensus sequence of the new matrix is "TNGCGTG", which fits exactly to the seven-nucleotide length consensus reported previously (Watson and Hankinson, 1992Go; Bacsi et al., 1995Go). So, with our analysis that was based on much broader collection of known AhR sites, we confirmed the well known consensus. In addition, we constructed the weight matrix that provides information on the nucleotide preferences in a broader, 11-nucleotide, window around this core consensus sequence.

Modulation of Gene Expression after AhR Activation. Microarray and RT-PCR experiments enabled an assessment of gene expression, and the results are given in the Table 1. We selected 114 genes (64 human and 50 rat) that changed their expression upon AhR activation. Among the 114 "changed" genes, we reconfirmed some of the well known targets of AhR with their known binding sites, such as CYP1A1, CYP1B1, Ugt1a1, and c-fos among others. We also confirmed some of the recently reported AhR targets, including Bcl-2, Bcl-xl, BAD, and cyclooxygenase 1, as shown in Table 1.

Analysis of Short Sequence Motifs in the Flanking Regions of AhR Sites (Local Context). To analyze the flanking regions of AhR sites, we computed the Local context as described under Materials and Methods. The analyzed sequences contained a proven AhR site in the center (11 bp in length, aligned according to the best fit to the matrix) and 50 bp flanks left and right from the site (the DNA strand orientation with regard to the TSS site was not considered). We identified 13 characteristics in these sequences (Table 2). Nine of them are over-represented in comparison with the negative control set of sequences (positive characteristics) and four underrepresented (negative characteristics). The negative control set was prepared as described under Materials and Methods by selecting from the INTERG sequences those subsequences of 111-nucleotide length in which the 11 central positions match with the AhR weight matrix (score cutoff q = 0.966).


View this table:
[in this window]
[in a new window]
 
TABLE 2 Over- and underrepresented oligonucleotides found in the flanking regions of AhR sites

The first nine characteristics are `positive' and the last four are `negative'. The sequences considered were of the length 111 bp (11-bp AhR site in the center and 50-bp flanks).

 

As shown in Table 2, the selected motifs are three to four nucleotides in length and contain many ambiguous letters (from the IUPAC 15 letter code). The corresponding windows in which the selected motifs were found to be over-represented/underrepresented are rather different in their length and location (Table 2, second and third column). Some of the windows are located on the left flank, some on the right, and some of them overlap in the central positions.

On the basis of these 13 characteristics (m = 13), we constructed the linear classification function according to eq. 1 discriminating sets of real AhR sites from the false positives in INTERG sequences. We refer to this function as AhR score of local context d.

The advantage of the score of local context d as an additional filter to identify potential AhR sites is demonstrated in Fig. 2. As a first step, we identified the potential AhR sites by using the weight matrix with a definite cutoff value qcutoff. Then, we applied the d score to an extended region surrounding each potential AhR site. The requirement d > 0 (dcutoff = 0.0) is used as an additional criterion for recognition of AhR sites. In Fig. 2, the distributions of d values for the set of real AhR sites (red) and for the set of "false" AhR-like motifs in the intergenic regions (blue) are shown. These two distributions are clearly distinct from each other and by setting the cutoff value to d = 0.0, we can filter out almost 80% of the false sites while losing only 14% of the true positive sites. Thus, this method allows us to increase the correctness of our predictions of AhR sites in genomic sequences.



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 2. Discrimination between real AhR sites with 50-bp flanks from the false positives in genomic sequences. Values of the score of local context d are shown on the abscissa. The numbers of observations are given on the ordinate.

 

Searching for AhR Sites in Genomic Sequences Using Score of Context. We applied our high-precision method for recognition of AhR binding sites to the genomic sequences of genes that were shown to be regulated (up or down) after activation of AhR. It is known that functional AhR sites may be situated in the promoter regions of the AhR target genes, in their introns as well as in far upstream and downstream noncoding regions around the genes. Therefore, we analyzed the G20gene20kb set (see Materials and Methods) that includes 20-kb regions upstream and downstream of the genes. This set contains 40 sequences of the total length 1,855,876 bp.

We searched for AhR sites on either strand using the consecutive application of AhR site weight matrix and the score of context. One hundred ninety-five sites with a frequency of 0.000105072 sites per bp (or approximately 1 site per 10,000 bp) were identified with a matrix score cutoff s = 0.98 and the context score cutoff d = 0.0. Sequences of chromosome 21 with a total length of 34,011,570 bp were used for data curation. The same cutoff values were applied (s = 0.98 and d = 0.0). The total number of identified AhR sites was 1880, which corresponds to 0.00005528 sites per bp (or approximately 1 site per 18,000 bp). Therefore, the frequency of AhR sites in the AhR-regulated promoters is almost 2 times higher compared with overall sequences of the entire chromosome 21.

To check whether the revealed sites have any preferences in their location relative to the start of transcription (TSS position), we calculated the frequencies of predicted AhR sites in different regions around TSS of AhR-regulated genes. It is striking that, for some windows of limited length around the TSS, the frequency of AhR sites is much higher than in the other regions of regulated genes as well as in the whole chromosome 21 (Fig. 3). For instance, within the window [-1000 TSS + 1000] (-1000 upstream and +1000 downstream from the start of transcription), we observed AhR sites with a frequency of 0.0005125 sites per bp (or approximately 1 site per 2000 bp), which is almost 10 times higher than in the whole chromosome 21.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 3. Frequency of AhR sites found by the program in different genomic sequences. Frequency in the whole 21 chromosome is shown by the yellow bar. Blue bars represent the frequency of the sites in different parts of the AhR-regulated genes.

 

Although the frequency of AhR sites is very high near the start of transcription, it is not the only place where binding sites can be found. They can be found in the promoter regions of AhR-regulates genes (in 5' proximity to the start of transcription), inside the first noncoding exon, and in introns as well as in the far 5' and 3' regions of the genes. A very distinct feature of the location of AhR sites is a preponderance to form clusters. These clusters may contain from k = 4 to 5 up to 10 to 15 sites and the extent of the clusters varies from l = 50 bp up to 1000 bp (see examples in the Fig. 4).



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 4. Some examples of the distributions of the found AhR sites in AhR-regulated sequences. For each cluster, the probability to occur by chance was calculated. Arrows show positions of the start of transcription (according to the RefSeq annotation of the first exon). The scale shows positions relative to the start of transcription. Short lines above the scale show the locations of predicted AhR binding sites.

 

We computed the probability of a given (k,l) cluster to occur just by chance in a sequence of length L given the total number of sites K in this sequence. The probability is calculated according to the algorithm described previously (Kel et al., 2001Go). We found statistically significant AhR-site clusters in about 40% of AhR-regulated genes. Among them are the following genes: ALDH3A1(human), six sites inside 66 bp, from +1540 to +1606; BAD(human), six sites inside 338 bp, from -7519 to -7181; and NR3C1 (human), six sites inside 765 bp, from -893 to -128) and many others.

Composite Modules Revealed in Promoters of AhR-Regulated Genes. To reveal novel composite modules, the local nucleotide sequence of promoter regions of AhR-regulated genes was analyzed. Several subregions around TSSs were studied [e.g., (-2000/+2000), (-1000/+100), and (-500/+100)], and we applied different parameters for searches with the CompositeModuleFinder. We compared the structure of AhR-regulated promoters to a set of promoters randomly taken from the human genome. An extended table with the results of this analysis is available on request. Among the most discriminative matrices were those recognizing HNF1, AhR, GR, OCT, and C/EBP (Table 3). It is interesting that the AhR matrices selected by the algorithm were in the set of promoters (-1000/+100) only. The algorithm did not select shorter and longer promoter sequences. This indicates that not only the presence of a binding site is important but also its relative position within the promoter.


View this table:
[in this window]
[in a new window]
 
TABLE 3 List of factors the matrices of which were selected by the algorithm in 5' regions of different length

 

Among the most prominent matrix pairs that were selected by the algorithm are HNF1/Sp1 and AP-1/NF-1 for the maximum distance r = 100 bp; E2F/NF-1, AhR/Myb, HNF3/NFY, HNF6/NF-{kappa}B, and Sp1/Myb for the maximum distance r = 50 bp; and HNF-1/GR for the maximum distance r = 40 bp. It is noteworthy that not all matrices found individually were also found in pairs. Thus, a composite module for promoters could be defined.

In Table 4, a combination of individual matrices and matrix pairs for a set of promoters of AhR regulated genes (-2000/+2000) are listed. One matrix (YY1) and two pairs (E2F/ROR and AhR/CREB) were selected by the algorithm, because their frequencies in the AhR-responsive promoters were lower than in others. All other matrices and pairs of matrices were selected to define an AhR-associated composite regulatory module CMAhR in the promoters investigated.


View this table:
[in this window]
[in a new window]
 
TABLE 4 Composite module CMAhR constructed for the (–2000 +2000) set of promoters

 

As given in Fig. 5, the distribution of the composite module score FCM is distinctly different (t test value is 12.12, p value = 1.7 x 10-28) and demonstrates robust recognition of AhR-responsive promoters. A summary of sites and site pairs from CMAhR found in AhR-regulated genes is given in Table 5.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 5. Discrimination between (-2000 to +2000) promoters (black bars) and background promoters from chromosome 21 (PR: -2000 to +2000) (striped bars) by the composite module CMAhR. The composite module score FCM is given on the abscissa, and the numbers of observations are given on the ordinate.

 

View this table:
[in this window]
[in a new window]
 
TABLE 5 Summary of sites, site pairs found in promoters (-2000 +2000) and clusters found in 5' regions (-20 kb + 20 kb) of AhR-regulated genes and other genes for transcription factors involved

 

The composite structure of an AhR-regulated promoter, e.g., the human ALDH3A1 gene, is given in Fig. 6. Identified sites, site pairs, and the AhR site cluster are all distributed over the whole region of 4 kb around the TSS. Comparison of this region between different species reveals that only the proximal promoter is conserved. There were no detectable homologies between human and mouse/rat in the other parts of this 4-kb region. It seems that the sites predicted in those parts are specific for the human gene only. In previous experimental studies, several AhR sites as well as NF-1 and GR sites were detected in different 5' upstream regions of rat and mouse ALDH3A1 gene (Vasiliou et al., 1999Go). Here, we confirmed that the proximal promoter of this gene has a typical AhR-regulation-associated structure.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 6. Analysis of potential TF binding sites in the 5' region of the human ALDH3A1 gene. Locations of the first exon of the transcripts are shown in yellow. a, Map of potential TF binding sites found in the region of the gene from -2000 to +2000. Potential composite elements AhR/Myb, AhR/NF-1, Myb/SP-1, and NF-1/AP-1 are marked. A cluster of AhR sites in +1500 region is also marked. b, phylogenetic footprinting of a region near start of transcription (TSS) for three species (human, mouse, and rat). Potential CE AhR/Myb is conserved among these three species. The experimentally known AhR site in the mouse gene is underlined. The predicted sites for HNF-3B, C/EBP, and USF are also conserved for three species. NF-1 and second MyB (both shown by a single-dashed line) can be identified in human and mouse sequences, but not in rat.

 

Prediction of Gene Expression Based on the Composite Modules. Next, we addressed the differences in the structure of the regulatory regions between up- and down-regulated genes. The dynamics of regulation is actually quite different. Therefore, we aimed to identify transcription factors that influence the dynamics of up- or down-regulation in response to AhR protein-DNA interaction.

We thus compared two sets of promoters: for the up-regulated genes ExpressUP and for down-regulated genes ExpressDOWN (taking the regions of promoters from -2000 to +2000). The relative expression level was considered as additional information for this analysis.

With the help of the Composite Module Finder Express, we determined a set of matrices that correlates with gene expression in the two sets of regulated genes (Table 6). Patterns with positive "impact" values {varphi} are over-represented in promoters of up-regulated genes (shown in bold). Patterns with negative impact values are over-represented in promoters of down-regulated genes (underlined).


View this table:
[in this window]
[in a new window]
 
TABLE 6 Composite module E001 that correlates with gene expression in the G4000ExpressUP and G4000ExpressDOWN sets

 

A linear function is computed (see Materials and Methods, eq. 2) that predicts the level of up- or down-regulation of a gene based on the sequence of its promoter. This function takes into account impact values for selected patterns given in Table 6. We obtained a significant correlation when the logarithm of the experimentally determined gene expression levels were plotted against predicted ones (Fig. 7).



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 7. Correlation between real relative expression values (log2 of the relative expression) and predicted ones. Prediction is done on the basis of the selected combinations of potential TF binding sites in the promoter regions of the genes. Different symbols represent different promoters where AhR sites were found. Diamond, by two AhR matrices; and square, by only one.

 

Identification of Genomic Target for the AhR. The study of promoters of AhR-regulated genes and the hierarchy of AhR sites enabled us to develop a definition of a novel genetic algorithm. The rules of this algorithm are given below and were applied to search for new AhR targets within the human genome.

  1. Promoters of 1100 bp in length around the tentative TSS (based on available RefSeq annotation of genes) were retrieved. Only genes with annotated 5'-untranslated regions longer than 150 bp were considered. This enhances the reliability of the selected TSSs, as shorter 5'-untranslated regions annotations frequently result from nonfull-length mRNAs. By this approach, 9246 promoters were retrieved from the human genome.
  2. We searched for potential AhR binding sites using the newly generated matrix (cutoff 0.966). Af first, 9982 sites were found, of which 3999 sites passed through the local context filter.
  3. For every gene that contains at least one potential AhR site within a region of 20 kb upstream and downstream of the TSS clusters with k = 2 to 7, covering the regions from L = 6 to 800 bp were computed. The probability for clusters to occur by chance was calculated as well. Finally, we selected only genes that have at least one cluster of AhR sites with a probability less than 0.1. With this approach 864 genes were selected. The full list of genes is available upon request.
  4. CMs were identified with the CMFinder program. So far, 194 potential target genes were identified that matched both composite modules.
  5. A list of 71 predicted target genes with the highest prediction scores are present in Table 7. Furthermore, expression of predicted genes was verified experimentally by RT-PCR and/or microarray studies. The results are given in Table 7, and we demonstrate approximately 75% concordance between predicted regulation and experimentally verified change in expression.


View this table:
[in this window]
[in a new window]
 
TABLE 7 Seventy-one potential target genes for AhR regulation selected by the highest values of different prediction scores

 


    Discussion
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 References
 
We developed a computational method to investigate promoters of genes that are regulated by the Ah-receptor. Our approach allows specific features of regulated promoters to be identified and to use this knowledge for the identification of novel target genes in various genomes. The developed method is based on a concept of hierarchical composite structure of eukaryotic gene regulatory regions.

The Concept of the CMs. Functionally related genes involved in the same molecular-genetic, biochemical, or physiological process are often regulated coordinately by specific combinations of transcription factors. Dynamic function-specific complexes of many different transcription factors, so called enhanceosomes (Merika and Thanos, 2001Go), are formed at gene promoters and enhancers driving gene expression in specific manners. At the level of DNA, the blueprints for assembling of such variable TF complexes on promoter regions may be seen as specific combinations of TF binding sites located in proximity to each other. We call such structures "composite regulatory modules". The various types of composite modules can be classified according to their ability to regulate gene expression. There may be several different types of CMs located in the regulatory region of one gene, that can be spaced (e.g., liver specific and muscle-specific enhancers of one gene) or overlapping. We also consider the hierarchy of the CMs based on the number of different TFs involved and the length of the sequence covered. Composite elements (CEs) consisting of two/three closely located sites belong to the lowest hierarchical level of CMs. CMs of a higher level may include more sites as well as CMs of a lower level such as CEs.

A number of approaches identifying composite motifs have been described: BioProspector (Liu et al., 2001Go), CoBind (GuhaThakurta and Stormo, 2001Go), MITRA (Eskin and Pevzner, 2002Go), and dyad search (van Helden et al., 2000Go). These programs help to discover new regulatory sites for yet unknown transcription factors, but an "ab initio" motif finding method is limited by the length of sequences and may not be suitable for the analysis of long regulatory regions of genes of human and other higher eukaryotic organisms. A valuable source to identify transcription factor binding sites is the TRANSFAC database (http://www.biobase.de) (Wingender et al., 2001Go).

We developed a new method for detecting de novo composite modules using information from TRANSFAC and the weight matrices for many different transcription factor families. It is a rather complex computational task to find combinations of weight matrices that are characteristic for a particular group of genes. Many variables are included in the model, such as the number of matrices in a module, the maximal number of matches considered, the distances between matches, and cutoff values for the matrices and their relative impact on the composite score. To find an optimal structure of the model we applied a genetic algorithm approach that was shown to be efficient in finding solutions. One of the reasons to use a genetic algorithm was the need to perform simultaneous fitting of the modeled parameters and to reduce the number of parameters. Because of the large number of possible matrix combinations it is impractical to use canonical multidimensional regression (linear or logit) without prior reduction of the number of parameters. Of course, when selecting a heuristic algorithm for optimization one must be aware that it finds only local optima and solutions can vary in different runs of the program. The selection of a proper goal function is very important for this approach. We constructed a robust goal function using some ideas of "utility calculations" (Kel et al., 1993Go). The requirement for normality and bootstrap procedures (see Materials and Methods) provides the correct environment to find composite modules that describe the whole set of promoters and not just a particular subset.

The success of applying such tools greatly depends on how well the general model of the promoter we are optimizing actually fits to reality. Because of the limitations of data here, we consider only relatively simple models. To improve the model in future, we will consider such additional parameters as mutual distances between each site in the CM, their orientations to each other, and location relative to the start of transcription. All these and many other parameters could be easily incorporated in the model and optimized by the help of the genetic algorithm.

The algorithm that we applied identified several factors, including, SP1, NF-{kappa}B, and C/EBP. It is interesting that the AhR matrix alone was insufficient for the description of AhR-regulated promoters. Thus, specific combinations with other factors provided the best discrimination for detecting AhR-regulated genes.

We also performed computational tests with the cmf program on virtual data. We generated a set of random sequences by shuffling randomly nucleotides in every sequence of the G4000 set. Then, we implanted several binding sites of certain transcription factors by placing them at random. We used the cmf program to compare the sequence set with the implanted sites and those selected at random. We varied the minimal scores of the sites and their frequencies to find the borders. The result of this simulation (Table 8) shows that the cmf program was able to determine correctly all TF matrices used for site implantation, although the more matrices were used for implantation the more difficult it became for the program to reveal the sites correctly. It is noteworthy that matrices were found to behave differently in the test. For instance, the matrices for C/EBP and OCT were easily identified whereas recognition of the matrix for AhR factors was more difficult, and the AP-1 matrix was most difficult to identify. Similarity between different matrices in the library is an obvious confounding factor. Unfortunately, the algorithm can not identify different members belonging to the same TF family. For example, in the case of C/EBP, identification of a particular member (e.g., C/EBP-{alpha}, -{beta}, or -{gamma}) was impossible. Nevertheless, specific combinations of transcription factors are indicative of a particular regulatory mechanism.


View this table:
[in this window]
[in a new window]
 
TABLE 8 Results of testing CMFinder on simulated data

 

Another test performed was the "knocking-out" of those matrices most important for AhR promoter activation (i.e., distortion of six matrices: AhR, C/EBP, OCT, AP-1, NF1, and HNF1) by exchanging values for nucleotides A versus T and C versus G. After such distortion (without changing the C/G content), the program was run again with complete library where these six matrices were "knocked-out". No significant differentiation could be identified between AhR-regulated promoters and other promoters. This proved the specificity of our algorithm and indicated that the matrices we identified were indeed important for recognition of the AhR-regulated genes.

Our composite module contained matrices for AhR, PPAR, HNF-6, STAT, ROR, and ETS. It is interesting that three different AhR matrices were included by the algorithm in the CM. This suggests that sites for AhR in the promoters play an important role in the up- or down-regulation of these genes. V$AHR_01 got the maximal impact value. It seems to be specific for several cytochrome P450 genes and influences their expression in response to AhR. V$AHRARNT_01 was constructed on the basis of data from SELEX experiments, whereas V$AHR_Q5 was constructed in the current study on the basis of genomic sites. These two matrices have very different impact values: the first matrix has a positive and the second matrix a negative impact value. Comparison of the structure of these two matrices shows that they are very similar in the core but differ in some nucleotides at the flanks. For example, in the position 10 in the matrix V$AHR_Q5 (Fig. 1), the most prominent nucleotide is G, whereas in the matrix V$AHRARNT_01 (see TRANSFAC) nucleotide G in the corresponding position is absolutely "for-bidden". This could influence binding of some other factors such as repressors in the vicinity of AhR sites.

The combination of matrices and matrix pairs included in the CMs revealed a network of transcription factors governing the AhR-dependent gene regulation. Figure 8 depicts this network. Some of the genes of the transcription factors included in the CM, such as C/EBP-{alpha}, HNF1-{beta}, HNF3, HNF4-{gamma}, OCT-1, are regulated by AhR as shown previously (Borlak et al., 2002Go, 2003Go). Analysis of their promoters using the CM points to an intriguing network of transcription factors with several feedback loops and a hierarchy of regulatory signals. This network of transcription factors can also explain the regulation of several genes that are not direct targets of AhR binding. Their regulation can be mediated through other TFs whose expression is regulated directly by AhR (Fig. 8).



View larger version (51K):
[in this window]
[in a new window]
 
Fig. 8. Scheme of proposed network of transcription factors involved in regulation of gene expression under induction by Ah-receptor.

 

On the genome scale, to search for potential new AhR target genes, we applied very stringent set of criteria to avoid high rate of false positives. In addition to the high score of the CM match, we require AhR sites with the proper oligonucleotide context to be found in the promoters as well as presence of clusters of AhR sites. With such stringent criteria, we definitely miss the indirect targets—genes whose AhR-dependent regulation is mediated through other TFs. As a result, a high proportion of true positive prediction is to be expected as it is confirmed by comparison with the results of microarray studies.

In conclusion, a computational method was developed for an ab initio identification of AhR-regulated genes. The applied algorithm revealed a specific motif within a distinct promoter region near the AhR sites. This finding enabled us to identify clusters of regulatory proteins. It is noteworthy that the genetic algorithm predicts the level of transcript expression and may be used for the identification of genes regulated by AhR.


    Acknowledgements
 
We are indebted to Olga Kel-Margoulis (BIOBASE GmbH) for fruitful discussion of the results, Tatjana Ivanova Institut (Cytology and Genetics, Novosibirsk) for testing of the genetic algorithm approach, Birgit Lewicki-Potapov (BIOBASE GmbH) and Rüdiger Zemlin (Fraunhofer Institute, Hannover, Germany) for the help in preparing sequence sets, and Asili Barre-Dirrie for the help in collection of AhR-binding sites.


    Footnotes
 
This work was mainly funded by BIOBASE GmbH. Parts of this work was supported by Siberian Branch of Russian Academy of Sciences, by grant of Volkswagen-Stiftung (I/75941) and by a grant of the Lower Saxony ministry of culture and science to J.B. P.N. was supported by the Alexander von Humboldt Foundation.

ABBREVIATIONS: TF, transcription factor; AhR, aryl hydrocarbon receptor; Arnt, aryl hydrocarbon receptor nuclear translocator; Ah, aryl hydrocarbon; RT-PCR, reverse transcription-polymerase chain reaction; PCR, polymerase chain reaction; TSS, transcription start site; kb, kilobase(s); bp, base pair(s); PWM, positional weight matrice; CM, composite regulatory module; CE, composite element; NF-{kappa}B, nuclear factor-{kappa}B.

1 We are using the following one-letter code for different combinations of alternative nucleotides: W, (A/T, read A or T); R, (A/G); M, (A/C); K, (T/G); Y, (T/C); S, (G/C); B, (T/G/C);V, (A/G/C); H, (A/T/C); D, (A/T/G); and N, (A/T/G/C). Back

Address correspondence to: Prof. Dr. Jürgen Borlak, Fraunhofer Institute of Toxicology and Experimental Medicine, Center for Drug Research and Medical Biotechnology, Nikolai-Fuchs-Str. 1, D-30625 Hannover, Germany. E-mail: borlak{at}item.fraunhofer.de


    References
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 References
 
Boehlk S, Fessele S, Mojaat A, Miyamoto NG, Werner T, Nelson EL, Schlondorff D, and Nelson PJ (2000) ATF and Jun transcription factors, acting through an Ets/CRE promoter module, mediate lipopolysaccharide inducibility of the chemokine RANTES in monocytic Mono Mac 6 cells. Eur J Immunol 30: 1102-1112.[CrossRef][Medline]

Bacsi SG, Reisz-Porszasz S, and Hankinson O (1995) Orientation of the heterodimeric aryl hydrocarbon (dioxin) receptor complex on its asymmetric DNA recognition sequence. Mol Pharmacol 47: 432-438.[Abstract]

Borlak J, Dangers M, and Thum T (2002) Aroclor 1254 modulates gene expression of nuclear transcription factors: implications for albumin gene transcription and protein synthesis in rat hepatocyte cultures. Toxicol Appl Pharmacol 181: 79-88.[CrossRef][Medline]

Borlak J, Hock A, Hansen T, and Richter E (2003) DNA adducts in cultures of polychlorinated biphenyl-treated human hepatocytes. Toxicol Appl Pharmacol 188: 81-91.[CrossRef][Medline]

Brazma A, Vilo J, and Ukkonen E (1997) Finding transcription factor binding site combinations in yeast genome, in Computer Science and Biology (Frishman D and Mewes HW eds) pp 57-59, Proceedings of the German Conference on Bioinformatics GCB-97, Martinsried, Germany.

Dunn JC, Tompkins RG, and Yarmush ML (1991) Long-term in vitro function of adult hepatocytes in a collagen sandwich configuration. Biotechnol Prog 7: 237-245.[CrossRef][Medline]

Eskin E and Pevzner PA (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18 (Suppl 1): S354-S363.[Abstract]

Fessele S, Boehlk S, Mojaat A, Miyamoto NG, Werner T, Nelson EL, Schlondorff D, and Nelson PJ (2001) Molecular and in silico characterization of a promoter module and C/EBP element that mediate LPS-induced RANTES/CCL5 expression in monocytic cells. FASEB J 15: 577-579.[Free Full Text]

Frech K, Quandt K, and Werner T (1998) Muscle actin genes: a first step towards computational classification of tissue specific promoters. In Silico Biol 1: 29-38.[Medline]

Guha Thakurta D and Stormo GD (2001) Identifying target sites for cooperatively binding factors. Bioinformatics 17: 608-621.[Abstract/Free Full Text]

Kel A, Kel-Margoulis O, Babenko V, and Wingender E (1999) Recognition of NFATp/AP-1 composite elements within genes induced upon the activation of immune cells. J Mol Biol 288: 353-376.[CrossRef][Medline]

Kel AE, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, and Wingender E (2003) MATCH: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 31: 3576-3579.[Abstract/Free Full Text]

Kel AE, Kel-Margoulis OV, Farnham PJ, Bartley SM, Wingender E, and Zhang MQ (2001) Computer-assisted identification of cell cycle-related genes: new targets for E2F transcription factors. J Mol Biol 309: 99-120.[CrossRef][Medline]

Kel AE, Ponomarenko MP, Likhachev EA, Orlov YL, Ischenko IV, Milanesi L, and Kolchanov NA (1993) SITEVIDEO: a computer system for functional site analysis and recognition. Investigation of the human splice sites. Comput Appl Biosci 9: 617-627.[Abstract/Free Full Text]

Kel OV, Romaschenko AG, Kel AE, Wingender E, and Kolchanov NA (1995) A compilation of composite regulatory elements affecting gene transcription in vertebrates. Nucleic Acids Res 23: 4097-4103.[Abstract/Free Full Text]

Kel-Margoulis O, Kel AE, Reuter I, Deineko IV, and Wingender E (2002a) TRANSCompel: a database on composite regulatory elements in eukaryotic genes. Nucleic Acids Res 30: 332-334.[Abstract/Free Full Text]

Kel-Margoulis OV, Ivanova TG, Wingender E, and Kel AE (2002b) Automatic annotation of genomic regulatory sequences by searching for composite clusters. Pac Symp Biocomput 187-198.

Liu X, Brutlag DL, and Liu JS (2001) BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput 127-138.

Merika M and Thanos D (2001) Enhanceosomes. Curr Opin Genet Dev 11: 205-208.[CrossRef][Medline]

Quandt K, Frech K, Karas H, Wingender E, and Werner T (1995) MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res 23: 4878-4884.[Abstract/Free Full Text]

Safe S and Wormke M (2003) Inhibitory aryl hydrocarbon receptor-estrogen receptor alpha cross-talk and mechanisms of action. Chem Res Toxicol 16: 807-816.[Medline]

Singh-Gasson S, Green RD, Yue Y, Nelson C, Blattner F, Sussman MR, and Cerrina F (1999) Maskless fabrication of light-directed oligonucleotide microarrays using a digital micromirror array. Nat Biotechnol 17: 974-978.[CrossRef][Medline]

Tronche F, Ringeisen F, Blumenfeld M, Yaniv M, and Pontoglio M (1997) Analysis of the distribution of binding sites for a tissue-specific transcription factor in the vertebrate genome. J Mol Biol 266: 231-245.[CrossRef][Medline]

van Helden J, Rios AF, and Collado-Vides J (2000) Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res 28: 1808-1818.[Abstract/Free Full Text]

Vasiliou V, Reuter SF, Williams S, Puga A, and Nebert DW (1999) Mouse cytosolic class 3 aldehyde dehydrogenase (Aldh3a1): gene structure and regulation of constitutive and dioxin-inducible expression. Pharmacogen 9: 569-580.

Wasserman WW and Fickett JW (1998) Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol 278: 167-181.[CrossRef][Medline]

Watson AJ and Hankinson O (1992) Dioxin- and Ah receptor-dependent protein binding to xenobiotic responsive elements and G-rich DNA studied by in vivo footprinting. J Biol Chem 267: 6874-6878.[Abstract/Free Full Text]

Whitlock JP (1999) Induction of cytochrome P4501A1. Annu Rev Pharmacol Toxicol 39: 103-125.[CrossRef][Medline]

Wingender E, Chen X, Fricke E, Geffers R, Hehl R, Liebich I, Krull M, Matys V, Michael H, Ohnhäuser R, et al. (2001) The TRANSFAC system on gene expression regulation. Nucleic Acids Res 29: 281-283.[Abstract/Free Full Text]




This article has been cited by other articles:


Home page
Brief BioinformHome page
E. Wingender
The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation
Brief Bioinform, July 1, 2008; 9(4): 326 - 332.
[Abstract] [Full Text] [PDF]


Home page
J. Pharmacol. Exp. Ther.Home page
R. Wu, L. Zhang, M. S. Hoagland, and H. I. Swanson
Lack of the Aryl Hydrocarbon Receptor Leads to Impaired Activation of AKT/Protein Kinase B and Enhanced Sensitivity to Apoptosis Induced via the Intrinsic Pathway
J. Pharmacol. Exp. Ther., January 1, 2007; 320(1): 448 - 457.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Waleev, D. Shtokalo, T. Konovalova, N. Voss, E. Cheremushkin, P. Stegmaier, O. Kel-Margoulis, E. Wingender, and A. Kel
Composite Module Analyst: identification of transcription factor binding site combinations using genetic algorithm.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W541 - W545.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Kel, T. Konovalova, T. Waleev, E. Cheremushkin, O. Kel-Margoulis, and E. Wingender
Composite Module Analyst: a fitness-based tool for identification of transcription factor binding site combinations
Bioinformatics, May 15, 2006; 22(10): 1190 - 1197.
[Abstract] [Full Text] [PDF]


Home page
Toxicol SciHome page
D. C. Volz, D. E. Hinton, J. M. Law, and S. W. Kullman
Dynamic Gene Expression Changes Precede Dioxin-Induced Liver Pathogenesis in Medaka Fish
Toxicol. Sci., February 1, 2006; 89(2): 524 - 534.
[Abstract] [Full Text] [PDF]