Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Statistical practice in high-throughput screening data analysis

Abstract

High-throughput screening is an early critical step in drug discovery. Its aim is to screen a large number of diverse chemical compounds to identify candidate 'hits' rapidly and accurately. Few statistical tools are currently available, however, to detect quality hits with a high degree of confidence. We examine statistical aspects of data preprocessing and hit identification for primary screens. We focus on concerns related to positional effects of wells within plates, choice of hit threshold and the importance of minimizing false-positive and false-negative rates. We argue that replicate measurements are needed to verify assumptions of current methods and to suggest data analysis strategies when assumptions are not met. The integration of replicates with robust statistical methods in primary screens will facilitate the discovery of reliable hits, ultimately improving the sensitivity and specificity of the screening process.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1
Figure 2: Typical location of controls on a 96-well plate.
Figure 3: Titration series in a translation assay.
Figure 4: Presence of edge effects in a high-throughput screen.
Figure 5: Replicates, false-positive and false-negative rates.
Figure 6: Verification of the assumptions of normally distributed data with constant variance among compounds.
Figure 7: Verification of the assumption that the within-compound variances follow an inverse gamma distribution.

Similar content being viewed by others

References

  1. Dove, A. Screening for content—the evolution of high throughput. Nat. Biotechnol. 21, 859–864 (2003).

    Article  CAS  Google Scholar 

  2. Landro, J.A. et al. HTS in the new millennium: the role of pharmacology and flexibility. J. Pharmacol. Toxicol. Methods 44, 273–289 (2000).

    Article  CAS  Google Scholar 

  3. Stein, R.L. High-throughput screening in academia: the Harvard experience. J. Biomol. Screen. 8, 615–619 (2003).

    Article  CAS  Google Scholar 

  4. Nelson, R.M. & Yingling, J.D. Introduction to High-Throughput Screening for Drug Discovery (IBC USA Conferences, Inc., San Diego, CA, 2004).

    Google Scholar 

  5. Campbell, D.T. & Kenny, D.A. A Primer on Regression Artifacts (Guilford Press, New York, 1999).

    Google Scholar 

  6. Stigler, S.M. Statistics on the Table: the History of Statistical Concepts and Methods (Harvard University Press, Cambridge, MA, 1999).

  7. Lundholt, B.K., Scudder, K.M. & Pagliaro, L. A simple technique for reducing edge effect in cell-based assays. J. Biomol. Screen. 8, 566–570 (2003).

    Article  CAS  Google Scholar 

  8. Zhang, J.H., Chung, T.D.Y. & Oldenburg, K.R. Confirmation of primary active substances from high throughput screening of chemical and biological populations: a statistical approach and practical considerations. J. Comb. Chem. 2, 258–265 (2000).

    Article  CAS  Google Scholar 

  9. Tukey, J.W. A survey of sampling from contaminated distributions. in Contributions to Probability and Statistics (ed. Olkin, I.) 448–485 (Stanford University Press, Stanford, CA, 1960).

    Google Scholar 

  10. Brideau, C., Gunter, B., Pikounis, B. & Liaw, A. Improved statistical methods for hit selection in high-throughput screening. J. Biomol. Screen. 8, 634–647 (2003).

    Article  Google Scholar 

  11. Gunter, B., Brideau, C., Pikounis, B. & Liaw, A. Statistical and graphical methods for quality control determination of high-throughput screening data. J. Biomol. Screen. 8, 624–633 (2003).

    Article  Google Scholar 

  12. Hoaglin, D.C., Mosteller, F. & Tukey, J.W. Understanding Robust and Exploratory Data Analysis (Wiley, New York, 1983).

    Google Scholar 

  13. Buxser, S. & Vroegop, S. Calculating the probability of detection for inhibitors in enzymatic or binding reactions in high-throughput screening. Anal. Biochem. 340, 1–13 (2005).

    Article  CAS  Google Scholar 

  14. Chen, Y., Dougherty, E.R. & Bittner, M.L. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Opt. 2, 364–374 (1997).

    Article  CAS  Google Scholar 

  15. Rocke, D.M. Design and analysis of experiments with high throughput biological assay data. Semin. Cell Dev. Biol. 15, 703–713 (2004).

    Article  CAS  Google Scholar 

  16. Lee, M.L., Kuo, F.C., Whitmore, G.A. & Sklar, J. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. USA 97, 9834–9839 (2000).

    Article  CAS  Google Scholar 

  17. Nadon, R. & Shoemaker, J. Statistical issues with microarrays: processing and analysis. Trends Genet. 18, 265–271 (2002).

    Article  CAS  Google Scholar 

  18. Box, G.E.P., Hunter, J.S. & Hunter, W.G. Statistics for Experimenters: Design, Innovation, and Discovery, edn. 2 (Wiley-Interscience, Hoboken, N.J., 2005).

    Google Scholar 

  19. Wright, G.W. & Simon, R.M. A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 19, 2448–2455 (2003).

    Article  CAS  Google Scholar 

  20. Smyth, G. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, no.1, art. 3 (2004).

  21. Baldi, P. & Long, A.D. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17, 509–519 (2001).

    Article  CAS  Google Scholar 

  22. Verkman, A.S. Drug discovery in academia. Am. J. Physiol. Cell Physiol. 286, C465–C474 (2004).

    Article  CAS  Google Scholar 

  23. Kerns, E.H. & Di, L. Pharmaceutical profiling in drug discovery. Drug Discov. Today 8, 316–323 (2003).

    Article  CAS  Google Scholar 

  24. Fay, N. & Ullmann, D. Leveraging process integration in early drug discovery. Drug Discov. Today 7, S181–S186 (2002).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank Jing Liu and Janie Lapointe for generating the Figure 3 data. This work was supported by the “Informatics and Chemical Genomics” funding to R.N. under the Genome Quebec Phase II Bioinformatics Consortium program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Nadon.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malo, N., Hanley, J., Cerquozzi, S. et al. Statistical practice in high-throughput screening data analysis. Nat Biotechnol 24, 167–175 (2006). https://doi.org/10.1038/nbt1186

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt1186

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing