Elsevier

Journal of Biotechnology

Volume 119, Issue 3, 29 September 2005, Pages 219-244
Journal of Biotechnology

Development of a large-scale chemogenomics database to improve drug candidate selection and to understand mechanisms of chemical toxicity and action

https://doi.org/10.1016/j.jbiotec.2005.03.022Get rights and content

Abstract

Successful drug discovery requires accurate decision making in order to advance the best candidates from initial lead identification to final approval. Chemogenomics, the use of genomic tools in pharmacology and toxicology, offers a promising enhancement to traditional methods of target identification/validation, lead identification, efficacy evaluation, and toxicity assessment. To realize the value of chemogenomics information, a contextual database is needed to relate the physiological outcomes induced by diverse compounds to the gene expression patterns measured in the same animals. Massively parallel gene expression characterization coupled with traditional assessments of drug candidates provides additional, important mechanistic information, and therefore a means to increase the accuracy of critical decisions. A large-scale chemogenomics database developed from in vivo treated rats provides the context and supporting data to enhance and accelerate accurate interpretation of mechanisms of toxicity and pharmacology of chemicals and drugs. To date, approximately 600 different compounds, including more than 400 FDA approved drugs, 60 drugs approved in Europe and Japan, 25 withdrawn drugs, and 100 toxicants, have been profiled in up to 7 different tissues of rats (representing over 3200 different drug–dose–time–tissue combinations). Accomplishing this task required evaluating and improving a number of in vivo and microarray protocols, including over 80 rigorous quality control steps. The utility of pairing clinical pathology assessments with gene expression data is illustrated using three anti-neoplastic drugs: carmustine, methotrexate, and thioguanine, which had similar effects on the blood compartment, but diverse effects on hepatotoxicity. We will demonstrate that gene expression events monitored in the liver can be used to predict pathological events occurring in that tissue as well as in hematopoietic tissues.

Introduction

The drug discovery and approval process is long, expensive, and inefficient; less than 1 in 10 promising drug candidates entering phase I clinical trials results in an approved product, thus the decisions made prior to phase I are inaccurate for greater than 90% of drug candidates. There are a number of reasons for this high failure rate, generally stemming from the absence of a full understanding of the biological and toxicological properties and mechanisms of the drug candidate. Comparison of the effects of a candidate compound on rats against an existing database of multiple parameters measured for a large number of marketed drugs, subsequently withdrawn drugs, and known toxicants provides the context to interpret the findings for the candidate and improve the accuracy of development decisions. Analysis of the transcriptional response in the cells of target organs can give an indication of the biochemical or biological mechanism affected by a pharmaceutical compound. Over the last few years, the toxicology community has begun to employ genome-wide gene expression profiling to dissect the mechanisms behind chemical toxicity and to use this information to increase the accuracy and sensitivity of toxicity testing. Specific gene expression profiles have been evaluated in several types of toxicological studies (Burczynski et al., 2000, Debouck and Goodfellow, 1999, Hamadeh et al., 2002, Harris et al., 2001, Nuwaysir et al., 1999, Waring et al., 2001a, Waring et al., 2001b). In this paper, we describe how this approach, termed “toxicogenomics”, or more generally “chemogenomics”, has been significantly extended, by expanding both the number of endpoints measured to include clinical chemistry, hematology, histopathology (using a standardized histopathology vocabulary and grading system), and organ weights, and by profiling a large number of compounds in more than one tissue. We demonstrate that coupling gene expression profiling with traditional toxicity measurements enhances the understanding of individual compound effects in rats. Furthermore, although beyond the scope of this paper to discuss in detail, the database contains a great deal of additional information. For example, the gene expression results for each compound are tied to its in vitro pharmacological activity, by first measuring each purified compound in 130 primarily human in vitro molecular pharmacology bioassays that measure selectivity, specificity, and affinity for receptor binding, cytochrome P450 activity and drug target enzymatic activities.

Currently, the database contains the profiles derived from administering approximately 600 different compounds to rats (400 FDA approved drugs, 60 drugs approved in Europe and Japan, 25 withdrawn drugs, 27 standard biochemicals, and 100 molecules of toxicological interest). These have been profiled in up to 7 different tissues, for a total of 3200 different drug–dose–time–tissue combinations. This system is designed to assist drug discovery professionals in the selection of the highest quality leads and drug candidates at the earliest, most cost-effective stages of drug discovery and development to eliminate likely failures as early as possible. In addition, scientists interested in the toxicology of industrial and agricultural chemicals may also benefit from new mechanistic and toxicological insights that can be gathered from this large contextual database. Assembling these data is a complex task as each microarray experiment includes 228 steps from compound selection, through in vivo animal dosing, tissue harvesting, RNA isolation, cRNA preparation, array hybridization, and finally to signal detection, processing, and data uploading.

Execution of a microarray experiment requires considerable time and effort to process even a modest number of samples. Most of the methods available for processing samples are manual procedures that are not easily adaptable to a controlled large-scale laboratory. Fully exploiting the promise of DNA array technology requires the ability to rapidly process large numbers of samples and to generate large collections of high quality expression profiles; thus automation and standardization of many of the experimental processes are required. Since a wide range of factors can affect microarray data quality, it is important to understand the various parameters involved (Holloway et al., 2002, Quackenbush, 2002, Yang and Speed, 2002). These parameters include any conditions that lead to unwanted changes in gene expression before RNA extraction, such as variations in the environment of the animals or cell cultures, or in experimental designs, as well as in post-RNA extraction steps such as in sample processing protocols and data processing procedures. In order to control and optimize these parameters, it is necessary to understand the performance characteristics of a microarray platform and the processing protocols. Important properties are data reproducibility within and between arrays, and within and between animal experiments; this paper shows how attention to these details can improve the quality and reproducibility of the results. We demonstrate a reduction in hybridization variation over a period of 17 months from 42% to 20%.

To address these issues, this paper is divided into four major sections: discussion of the protocol and design considerations for building a large-scale chemogenomic database; discussion of process optimisation; summary of the results of clinical chemistry, hematology, histopathology, and organ weight findings; and finally a demonstration of the usefulness of the database using a case study.

The case study uses three anti-cancer drugs – carmustine, methotrexate, and thioguanine – as examples to demonstrate the utility of the entire dataset across five different data domains (microarray RNA expression data, clinical chemistry, hematology, organ weight, and histopathology). Their well-characterized toxicological effect of bone marrow depletion was confirmed by hematology data analysis that demonstrated depleted leukocytes and lymphocytes in peripheral blood samples. Analysis of steady-state expression levels of mRNAs in multiple tissues (including whole blood) and examination of gene expression changes in the liver identified blood-selective genes whose mRNAs were systemically repressed by these and other anti-cancer compounds. Based on the identity of one of these genes, aminolevulinic acid synthase 2 (Alas2), reticulocyte depletion was predicted and confirmed experimentally, suggesting that this gene can serve as an indicator of bone marrow toxicity, even if measured in other, non-hematopoietic, tissues. The usefulness of pairing clinical data with expression data in a contextual database is further demonstrated by showing that while the three compounds had similar gene expression patterns relating to their toxicity towards the blood compartment, they had divergent profiles relating to their differing degrees of hepatotoxicity.

Section snippets

Gene expression data

The 69 array gene expression data supporting the case study described within this publication are available at NCBI's Gene Expression Ominbus (http://ncbi.nlm.nih.gov/geo/) under the series entry accession #GSE2409.

Compounds

The compounds used for the studies described here were obtained from a variety of different sources including Axxora [formerly Alexis Biochemicals] (USA), Bachem Chemicals (USA), Fisher Scientific [Acros Chemical Division] (USA), Onyx Scientific (UK), Sequoia Research Products (UK),

Highly standardized in vivo biology protocol

The effectiveness of a chemogenomic database relies heavily on thoughtful compound selection and data reproducibility, which in turn relies on standardized protocols. There were several protocols that required full standardization for the generation of consistent and high quality array data. These include the compound and dose selection protocols, the in vivo biology (exposure time and animal data collection protocols) and the array processing protocols (RNA isolation, cRNA preparation, array

Contextual database

The drug discovery process is an extraordinarily difficult and expensive endeavor, where greater than 90% of promising drug candidates fail to obtain regulatory approval. There are a number of reasons for this high attrition rate, an important one being the discovery in clinical trials of a previously unnoticed or misunderstood toxicity, or because of a lack of efficacy, due to a poor understanding of the target's function and pharmacology. We have developed a system that we believe will lower

Acknowledgments

The authors would like to thank Ving Lee for his chemistry insight and Ken Zaret and David O’Reilly for critical reading of the manuscript. Sandra Phillips and Jon Mirsalis at SRI International are also gratefully acknowledged. We also would like to thank all the members of the Iconix array facility, including James Batson, for providing array data and everyone else at the company for helpful discussions.

Portions of the statistical analyses for this paper were generated using SAS/STAT™

References (18)

There are more references available in the full text version of this article.

Cited by (272)

  • Toxicogenomics scoring system: TGSS, a novel integrated risk assessment model for chemical carcinogenicity prediction

    2023, Ecotoxicology and Environmental Safety
    Citation Excerpt :

    Benefiting from the remarkable technological advances, a large amount of TGx data have been generated. The Library of Integrated Network-Based Cellular Signatures (LINCS) program (Keenan et al., 2018) and several large TGx databases such as Open Toxicogenomics Project-Genomics Assisted Toxicity Evaluation Systems (TG-GATEs) (Igarashi et al., 2015) and DrugMatrix (Ganter et al., 2005) provide large-scale transcriptomic profiling data. However, chemicals included in these projects were mainly drugs which are not the objective of our study.

View all citing articles on Scopus
1

They contributed equally to this work.

2

Present address: Agilent Tech. Inc., 3500 Deer Creek Road, Palo Alto, CA 94304, USA.

3

Present address: Mpex BioScience Inc., 500 Campanile Drive, San Diego, CA 92182, USA.

4

Present address: Integrium, 14351 Myford Road, Tustin, CA 92780, USA.

5

Present address: Nuomics Consult. Ltd., 3 Merlin Drive, Ely, Cambridge CB6 3EA, UK.

6

Present address: 2077 Tapscott Avenue, El Cerrito, CA 94530, USA.

7

Present address: 3656 Jefferson Avenue, Redwood City, CA 94062, USA.

8

Present address: Pharmacopeia Drug Discovery, Inc., P.O. Box 5350, Princeton, NJ 08543-5350, USA.

View full text