![]() |
|
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Department of Molecular Biology, The Scripps Research Institute, La Jolla, California (P.B., C.F.B.); and Department of Pharmacology and Toxicology, University of Arizona College of Pharmacy, Tucson, Arizona (D.S.)
Received May 14, 2004; accepted August 25, 2004
| Abstract |
|---|
TFs are multidomain proteins typically composed of a DNA binding domain (DBD), responsible for specific contacts with DNA bases, and an effector domain (ED) that mediates activation or repression of targeted genes. Some TFs contain additional post-transcriptional regulatory elements, such as dimerization domains and phosphorylation sites (Ciarapica et al., 2003
). TFs exert their action by binding to specific DNA sequences in chromatin and recruiting appropriate global coactivator and corepressor regulatory complexes. TF activator complexes include Mediator, which interacts with core promoter factors p300/CREB-binding protein-associated factor and p300/CREB-binding protein, which contain histone acetyltransferases that modify nucleosomes to a transcriptionally active state, and the SWI/SNF chromatin remodeling complex, which modifies the position of nucleosomes enabling additional TF binding (Gebuhr et al., 2003
). Examples of TF repressors are Sin3-HDAC and NuRD, which contain histone deacetylases that modify nucleosomes to a transcriptionally inactive state (Ansari and Mapp, 2002
).
Given their pivotal role in controlling cell fate, aberrant expression or incorrect processing of TFs contributes to the progression of a variety of diseases, including developmental abnormalities and cancer. The TF p53 is the most commonly mutated gene in human cancer (Harms et al., 2004
). Several chromosomal translocations in acute myeloid leukemia generate chimeric TFs by linking a DBD from one TF with a repression domain of another; the chimera triggers abnormal target gene regulation (Steffen et al., 2003
). Another example is the altered regulation of the STAT (signal transducer and activator of transcription) family of TFs. Constitutive activity of STAT proteins or expression of C-terminal mutated STATs, particularly STAT3 and STAT5, contribute to malignancy and cellular transformation (Benekli et al., 2003
).
| Artificial TF Design |
|---|
B subunit p65 (Liu et al., 2001
|
Various scaffold molecules have been used for the generation of DBDs. Specific DNA recognition has been successfully achieved using several synthetic approaches: polyamides, triple-helix-forming oligonucleotides, and peptide nucleic acids (reviewed by Uil et al., 2003
). These DBDs are connected to short activation or repression domains via flexible or rigid linkers (Arora et al., 2002
). The advantage of these synthetic approaches is the small size of the TF, which can facilitate both synthesis and cellular uptake. In addition, synthetic DBDs have demonstrated a high level of affinity, permitting not only regulation of targeted promoters but also specific competition with endogenous TFs (Chiang et al., 2000
; Bremer et al., 2001
; Coull et al., 2002
; Ehley et al., 2002
; Stanojevic and Young, 2002
; Wurtz et al., 2002
; Fechter and Dervan, 2003
; Yang et al., 2003
). DNA microarray experiments have shown that polyamides seem to be able to regulate a limited number of genes in lymphoid cells (Dudouet et al., 2003
).
| Using Protein Scaffolds: Zinc Finger Domains |
|---|
-helix is packed against two antiparallel
-strands, and additional stability is provided by the coordination of a zinc ion by the side chains of two cysteine and two histidine residues (Miller et al., 1985
-helix make specific contacts with DNA bases in the major groove. ZF domains have been useful for the construction of specific DNA binding proteins primarily because of two properties: sequence specificity and modularity (Fig. 2A). The structure of the Zif268-DNA complex (Pavletich and Pabo, 1991
-helix) contacted the 3' base. These base contacts are made with only one strand of the DNA duplex. A cross-strand contact involves position +2 in the
-helix (Asp2) and a cytosine or adenine base in the adjacent complementary triplet on the opposite DNA strand. This interaction has been shown to restrict the modularity of this family of ZF proteins (Isalan et al., 1997
|
| Isolation of Sequence-Specific ZFs |
|---|
Specific ZF sequences were obtained that displayed good specificity for purine-rich triplets (Segal et al., 1999
; Dreier et al., 2000
, 2001
). However, selection of ZF sequences capable of binding specifically to C- or T-containing triplets, especially those with C or T as the 5' nucleotide, has been more challenging. This is true in part because purine bases (G and A) offer more hydrogen-bonding possibilities than pyrimidines and in part because the amino acids that would recognize C and T typically have short side chains and thus cannot easily span the distance from position 6 in the
-helix to the 5' DNA base.
Phage display methods have also been applied to select for domains or proteins that recognize specialized nucleic acid structures, such as methylated DNA (Choo, 1998
), quadruplex DNA (Isalan et al., 2001b
), and noncanonical duplex RNA (Blancafort et al., 1999
). However, proteins recognizing such exotic structures have yet to find utility comparable with that of their duplex-DNA-binding counterparts.
Other methods to screen for novel DNA binding ZF domains take advantage of the yeast one-hybrid system, as shown for the cell-based selection of ZFs that bind sequences in the MDR-1 promoter (Cheng et al., 1997
; Bartsevich and Juliano, 2000
). A cell-based ZF selection system was also established in bacteria to optimize multifinger proteins (Hurt et al., 2003
). The ZF library used in these studies combined cassette mutagenesis in the ZF helix followed by domain shuffling. Specific ZF proteins were selected in a bacterial two-hybrid system.
| Building Polydactyl ZF Proteins |
|---|
In contrast, we have adopted a helix-grafting strategy, based on the modular property of ZF proteins (Fig. 2, B and C) (Beerli et al., 1998
). With the caveat that an overlap contact would need to be accommodated between certain subsets of ZF units, multifinger proteins were constructed by replacing or "grafting" the helix regions of modified ZFs onto the scaffold of a highly regular, existing zinc finger protein (Sp1C; Shi and Berg, 1995
). The use of a highly regular scaffold ensured that each domain would be displayed on the protein in the same way, allowing assembly of the modified ZF domains in nearly any order. This strategy provided an extremely rapid method for construction of polydactyl ZF proteins without the need for phage display and selection for each new DNA target. Recently Segal et al. (2003a
,b
) have used this modular strategy to construct more than 80 engineered three-ZF proteins and have shown that these novel proteins were able to interact with their predicted DNA binding sites. Nagaoka et al. (2002
) have used helix grafting to change the specificity of the SP1 protein (which naturally binds GC-rich regions) to recognize an AT-rich element. The helices used for grafting were derived from the Drosophila melanogaster CF2-II protein, which recognizes AT-rich sequences.
By increasing the number of ZF units in the multifinger protein, the number of bases targeted can be expanded. It should be possible to target low-frequency, potentially unique sites in the human genome using six or more ZF units (Liu et al., 1997
). Barbas and collaborators have constructed polydactyl six-ZF proteins using the five-amino acid canonical linker (TGEKP) between the ZF units (Liu et al., 1997
; Segal and Barbas, 2001
; Beerli and Barbas, 2002
). Kim and Pabo (1998
) reported the construction of a six-ZF protein with high specificity and affinity made by joining two three-ZFs with a longer 9-aa linker. The increased affinity and specificity provided by the longer linker was attributed to an increased flexibility in the six-ZF-DNA complex. Indeed, it has been suggested that smaller linkers could generate a loss of entropy resulting in loss of affinity (Peisach and Pabo, 2003
). Moore et al. (2001
) described the construction of highly specific six-ZF proteins built by three groups of two-ZF units. The authors modified the linker sequence between the two-ZF units by insertion of additional Gly or Ser residues into the canonical linker sequence.
| Regulating TF Expression |
|---|
| From Genes to Phenotypes: toward Regulating Endogenous Gene Expression Using ZF-Based TFs |
|---|
|
|
| Artificial Transcription Factors Regulating Specific Drug-Target and Disease Genes |
|---|
Beerli and coworkers (Beerli et al., 1998
, 2000a
) have constructed two six-ZF proteins able to specifically regulate the erbB-2 and erbB-3 proto-oncogenes in several cancer cell lines. These oncogenes are overexpressed in a majority of breast cancer tumors and play an essential role in regulating proliferation of breast cancer cells. The two ZF proteins targeted two highly related DNA sequences (15 of 18 bp identity) in the 5'-untranslated region of erbB-2 and erb-B3. Independent regulation of one gene but not the other demonstrated that the designed ZF proteins were able to regulate their endogenous target genes with high degree of specificity. These genes were up-regulated by attaching the VP64 activation domain, and down-regulated by linking a transcriptional repression domain, KRAB. It is noteworthy that cancer cell lines expressing these regulators by retroviral delivery recapitulated the cell cycle alterations induced by gain or loss of function of the erbB-2 and erbB-3 oncogenes (Beerli et al., 1998
, 2000a
). Holbro et al. (2003
) used these artificial TFs to demonstrate the essential role of erbB-3 in conjunction with erbB-2 to regulate breast tumor cell proliferation.
Corbi et al. (2000
) constructed a designed TF able to bind and activate a transgene of the Utrophin gene promoter. Up-regulation of this gene would be therapeutic treatment for Duchenne muscular dystrophy. Other examples of disease genes targeted with TFs are IGF2 and H19, involved in cancer and Beckwin-Wiedemann syndrome, respectively (Jouvenot et al., 2003
). These genes are silenced by natural mechanisms of imprinting in a disease stage but were reactivated by an artificial TF.
In mammalian cells, several designed three-ZF proteins have been directed to regulate genes controlling angiogenesis. Angiogenesis is the process of new blood vessel formation, which is critical for tumor development. Therefore, these genes have become attractive targets for therapeutic regulation. ZF-based TFs have recently been targeted to the promoter of VEGF-A (Liu et al., 2001
; Rebar et al., 2002
). These proteins were able to activate expression of the endogenous gene, induce angiogenesis, and accelerate wound healing in mouse models. It is noteworthy that the new vasculature induced by the TFs was not hyperpermeable, a trait not observed after simple cDNA delivery of the gene. These results demonstrated that artificial TFs could efficiently generate physiological effects in the context of the whole organism. Efficient repression of VEGF-A by artificial TFs was recently demonstrated by Snowden et al. (2002
, 2003
). In these studies, engineered ZFs recognizing the VEGF-A promoter were linked to a minimal histone methyltransferase domain. The authors showed that the ZF-directed local methylation of histone H3 in cells triggered gene repression. The TFs were able to repress the gene in a highly tumorigenic cell line to the levels comparable with a nonangiogenic, low tumorigenic cell line.
Bartsevich and Juliano (2000
) and Xu et al. (2002
) selectively down-regulated the MDR1 multidrug resistance gene with an artificial TF. In another recent report, Falke and Juliano targeted the pro-apoptotic Bax gene and showed that a designed five-ZF protein was able to induce apoptosis in p53-deficient cell lines (Falke et al., 2003
). This suggests that designed ZF proteins may be used to induce apoptosis in cancer cells that have mutated or inactivated p53.
Tan et al. (2003
) targeted CHK2, a key gene regulating cell cycle progression. This protein kinase phosphorylates several substrates, including the tumor suppressor protein p53. The authors targeted a six-ZF protein recognizing 18 bp in the promoter of the CHK2 gene. The artificial TF was able to repress specifically the CHK2 gene, as determined by DNA microarray experiments. It is noteworthy that the TF-induced repression elicited loss of phosphorylation of p53 in human cells.
In another recent report, Bartsevich et al. (2003
) targeted the mouse Oct-4 gene, which is involved in differentiation of embryonic stem cells. TF technology could be used to regulate the cell fate of pluripotent stem cells, perhaps redirecting specific differentiation programs. These TF could be used as therapeutic tools to regulate tissue regeneration from stem cells.
Another important functional application of designed TFs is described by Ren et al. (2002
). The authors used specific TFs targeted against two different promoters to identify the functionally relevant isoform for the gene PPAR, involved in adipogenesis.
| Artificial TFs As Antiviral Tools |
|---|
| Targeting Genes with DNA-Modifying Enzymatic Domains |
|---|
| Regulating Gene Expression in Plants |
|---|
Another emerging application of TF technology is the production of proteins and pharmacologically active plant metabolites. TFs constitute new tools to increase the production of metabolites, such as flavonoids and alkaloids, by activating multiple enzymes involving biosynthetic pathways, and to repress others (Gandet and Memelink, 2002
).
| TF-Based, Genome-Wide Strategies to Regulate Gene Expression: TF Libraries for the Modification of Phenotypes and Gene Discovery |
|---|
To functionally select genomic sequences that can be targeted by TFs and therefore used to efficiently modify endogenous transcription, methods have been developed for screening combinatorial TF libraries in mammalian cells (Blancafort et al., 2003
). Such TF libraries are composed of modified ZFs domains for every targetable 3-bp site, randomly assembled into three- and six-ZF TFs. When delivered into a population of mammalian cells, large TF libraries have the possibility to interact with many different regions of genomic DNA sequence, approximately hundreds of unique potential binding sites per gene. TF library members "scan" the genome for accessible, transcriptionally open DNA sequences. A variety of assays can be applied to identify cells displaying a phenotypic change, such as induced expression of a surface marker or altered cell morphology, resulting from the TF activation or repression of one or more genomic loci. TFs inducing the phenotype of interest can then be used as molecular probes to isolate relevant regulatory regions, to discover genes, and to provide insights into the coregulation of genes in a given pathway. In this sense, TF libraries can be regarded as a functional genomics tool, linking functional regulatory sequences in complex genomes with cellular phenotypes. Barbas and coworkers have performed selections of TF libraries in several cancer cell lines to regulate genes crucial to tumor biology and tumor progression. Selections were performed by cell sorting using antibodies recognizing specific antigens that were differentially regulated on the surface of tumor cells. TFs have been isolated from TF libraries that specifically up- and down-regulate many important molecules, such proto-oncogenes erbB-2, such angiogenic molecules as CD144 (VE-cadherin; Blancafort et al., 2003
), and such cell adhesion molecules as ICAM-1 (Magnenat et al., 2004
).
TFs selected from combinatorial libraries are able to regulate a given target gene directly (by interacting with the promoter) or indirectly (by regulating upstream genes controlling target gene transcription). To select TFs able to regulate directly the erb-B2 gene Lund et al. (2004
) developed a novel phage display strategy to select for ZF proteins from combinatorial libraries binding the proximal erb-B2 promoter. The authors isolated TFs binding the promoter that were able to regulate the endogenous erb-B2 gene.
Our recent studies have isolated artificial TFs modulating complex phenotypes in cancer cells, such as cell growth, proliferation, resistance to drugs, and metastasis (P. Blancafort, manuscript in preparation). These investigations have discovered and regulated genes involved in tumor progression. Therefore, artificial TFs have demonstrated their potential for therapeutic reprogramming of cancer cell phenotypes.
Bae et al. (2003
) have produced similar TF libraries by PCR amplification of endogenous human ZFs. These TF libraries could be used to modulate cellular phenotypes, such as yeast drug resistance and mammalian cell differentiation. In combination with other genomic approaches, such as DNA microarray and chromatin immunoprecipitations, genome-wide strategies could provide candidate genomic targets that are relevant for drug discovery in complex diseases, such as tumor progression. Another functional application of combinatorial TF libraries was described by Lee et al. (2003
). A TF library expression combined with cDNA microarray technology provided a tool to cluster and classify groups of genes that are actively transcribed in many different cellular backgrounds.
| TFs versus RNA-Based Methods to Regulate Gene Expression |
|---|
22-bp double-stranded sRNAs called MicroRNAs or small interfering RNAs (siRNAs). Since the discovery of the efficacy of this approach in Caenorhabditis elegans in 1998 (Fire et al., 1998
22-mer, single-stranded RNA molecules that effect gene expression using many of the same protein components of the siRNA pathway. The main difference between miRNA and siRNA is that miRNA causes translational repression of the gene target without cleavage of the mRNA. This seems to occur because the miRNA binds to its gene target with imperfect complementarity (Bartel, 2004Several interesting features can be compared between DNA-targeting artificial TFs and RNA-targeting methods such as RNAi and antisense, particularly regarding delivery, specificity, and function (Table 2). Delivery remains a formidable obstacle to the use of these technologies in humans. TFs, siRNAs, and antisense agents can be delivered to target cell types transiently (using transfection reagents) or stably (using retroviral, adenoviral, or lentiviral vectors). Stability and half-life in vivo of antisense and siRNAs is a primary concern and can be improved by chemical synthesis using modified nucleotides. However, synthesis of such compounds can be expensive, and many labs today use vector-derived transfections. For efficient gene knock-down, RNAi requires high expression levels that can be achieved with polymerase III promoters. Cells do not naturally take up TF proteins, so transient delivery requires transfection of TF-encoded cDNA. However, the stability of artificial TFs is comparable with naturally occurring ones, and because TFs act directly as transcriptional regulators, they do not require high-level expression to achieve biological effects. For all these technologies, the use of tissue-specific promoters is perhaps the system of choice to express these artificial regulators in the proper target organ or tissue. However, additional mechanisms to control regulator function may be available to artificial TFs, such as activation by small molecule ligands as described earlier.
|
Specificity for both TFs and RNA strategies is achieved through base contacts. More base contacts generally provides better specificity. The upper limit of this reasoning is reached when the binding energy becomes so strong that a mismatched base contact can no longer sufficiently destabilize the binding complex. In practice, however, pragmatic concerns usually govern the size of the binding site. For example, extending the number of ZF units can extend the number of specific TF contacts with the DNA. Because a six-ZF TF can potentially recognize a unique 18-bp site in the human genome, there is little practical reason to exceed this binding site size. Six-ZF TFs have been shown to have higher affinities and better discrimination than three-ZF TFs (Beerli et al., 1998
; Blancafort et al., 2003
; Lund et al., 2004
), and certain designed six-ZF TFs were found to regulate only their single targeted genes based on microarray analysis (Guan et al., 2002
; Tan et al., 2003
). The specificity of siRNA is governed by Dicer and associated proteins that function optimally with
22-bp molecules. Although a site of this length should provide unique targeting in the human genome, recent expression profiling has demonstrated off-target gene regulation with siRNA, indicating that the full 22-bp specificity is not expressed (Jackson et al., 2003
). It should be emphasized, however, that more studies of this type will be required for a proper evaluation of specificity for any of these regulatory methods, and investigators should be encouraged to perform such studies. As far as being able to actually build a regulator that can bind an optimal binding site using present-day technology, it might seem, a priori, that siRNA and antisense have an advantage. The spectrum of sequences that can be targeted by artificial TFs is somewhat limited by the existing lexicon of zinc finger domains. Although the current technology is still sufficient to create more than a billion proteins, with the potential to recognize a targetable sequence every 32 nucleotides, recognition of C- and T-rich sequences remains challenging. In the case of the RNA technologies, simple Watson-Crick base pairing rules allow recognition of any sequence. However, in practice, the primary technical barrier limiting the success of both the TFs and RNA technologies in vivo is not the number of targetable sites but the accessibility of those sites. Target sites may be blocked by endogenous binding factors, such as RNA-binding proteins or DNA-binding nucleosomes. siRNA and antisense strategies are additionally susceptible to unfavorable three-dimensional structures, which occur far more frequently in RNA than DNA. As described above, a practical approach has been to construct regulatory agents to several target sites and determine empirically which function best. Combinatorial libraries of agents offer an alternative solution. Finally, it is instructive to consider that some antisense agents have been shown to exert non-sequence-dependent effects through interaction with other macromolecules (Khaled et al., 1996
). This example should serve as a caveat to all investigators when considering how to evaluate specificity in their experiments.
The function of artificial TFs and RNA technologies differ significantly once they arrive at their specific target sites. Most obviously, TFs target DNA sites, of which there are only two or fewer copies in the cell. In contrast, there will be many more copies of mRNA produced from each DNA gene. For highly expressed genes (for example oncogene overexpression in cancer cells originated by multiple gene duplications), RNAi might not eliminate the total population of target RNA, and substantial protein product could elicit some residual phenotype. In this case, TFs and siRNAs might be contemplated as companion technologies that could work in synergy to down-regulate gene expression by both reducing the rate of RNA production and by increasing specific degradation. Another significant difference is that artificial TFs have the ability to both up- and down-regulate transcriptional levels of a given gene (depending on the effector domain), and thus either gain- or or loss-of-function phenotypes are accessible. In contrast, RNAi and antisense can only be applied to negatively regulate RNA levels, at least in the direct sense. This difference is important in the context of molecular therapeutics, because transcriptional levels of a given gene can oscillate depending on the cell type and the disease stage. Therapeutic application or drug target validation studies may also require up-regulation rather than down-regulation of a particular gene.
| Summary and Outlook |
|---|
Two general strategies have been developed to generate TFs that can regulate endogenous genes. The first is a de novo targeting strategy by which a particular DNA binding site is chosen in a promoter of interest near the transcription start site of a gene (Fig. 4A). Information regarding chromatin-accessibility and endogenous TF binding sites is required to choose accessible sites. Target sites are chosen, and polydactyl proteins are constructed based on the existing lexicon of modified ZF domains and the "targetable" DNA triplets available in the accessible region. Binding and specificity of these custom-designed proteins are verified first with DNA binding assays in vitro, then with reporter gene assays, and finally in the chromatin context with assays measuring specific endogenous gene regulation.
|
The second strategy involves the creation of combinatorial TF libraries for functional screening in living cells or organisms (Fig. 4B). In this case, TF library members eliciting the highest biological effect can be selected in the first step. Thereafter, information regarding the bound DNA sequence (based on the known DNA-recognition domains of the selected DBDs), the binding location (based on chromatin immunoprecipitation assays), and specificity of regulation (based on DNA microarray and genomic search assays) can be integrated to determine the putative genes targeted directly by the TF. Combinatorial TF libraries have been built using both synthetic and natural ZF domains. Such libraries have been shown to be powerful tools for modification of phenotypes and have opened new pharmacogenomic approaches to the discovery of genes and regulatory regions involved in disease. Both approaches, de novo design of TFs and selection of TFs from combinatorial libraries, represent powerful complements to existing methods for genetic manipulation. Artificial TFs exploit the inherent transcriptional capabilities of cells to modify cellular functions. This is especially interesting in the context of diseases that are able to progress or evolve by changes in transcriptional programs in a given cellular type. Another advantage of TFs is their ability to activate specific promoters within a transcriptional unit and generate the transcript isoforms that are relevant in a particular cellular background. As described elsewhere (Segal, 2002
), artificial TFs can be constructed easily by any investigator using published information, without the need to employ exotic techniques such as phage display or to collaborate with a specialized zinc finger laboratory. Finally, artificial TFs can be used to overcome existing cDNA patents (Jamieson et al., 2003
).
A potential limitation of ZF-based TF design is that structural features of the current zinc finger domains may ultimately impose restrictions on the spectrum of recognizable DNA sequences. Binding specificity is largely determined by the orientation of the
-helix and the amino acids it displays in the major grove. Because all current domains used to construct custom ZF proteins have been based on ZF2 of Zif268, future domains might benefit from experimentation with different ZF frameworks. For example, such new domains might position the recognition helix closer to the second DNA strand to allow additional specific interactions. At the moment, specificity can be improved in vitro by adding ZF DNA domains, which increases the number of potential specific interactions with the DNA. However, in the context of a complex genome, the addition of ZF domains decreases the number of potential TF binding sites. Future development of TF technology should additionally take into account the ability of TFs to access and or modify chromatin in silent promoters, perhaps incorporating novel domains able to control these processes.
| Footnotes |
|---|
Address correspondence to: Carlos F. Barbas III, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037. E-mail: carlos{at}scripps.edu
| References |
|---|
Akopian A, He J, Boocock MR, and Stark WM (2003) Chimeric recombinases with designed DNA sequence recognition. Proc Natl