The Bioperl Toolkit: Perl Modules for the Life Sciences

  1. Jason E. Stajich1,18,19,
  2. David Block2,18,
  3. Kris Boulez3,
  4. Steven E. Brenner4,
  5. Stephen A. Chervitz5,
  6. Chris Dagdigian6,
  7. Georg Fuellen7,
  8. James G.R. Gilbert8,
  9. Ian Korf9,
  10. Hilmar Lapp10,
  11. Heikki Lehväslaiho11,
  12. Chad Matsalla12,
  13. Chris J. Mungall13,
  14. Brian I. Osborne14,
  15. Matthew R. Pocock8,
  16. Peter Schattner15,
  17. Martin Senger11,
  18. Lincoln D. Stein16,
  19. Elia Stupka17,
  20. Mark D. Wilkinson2, and
  21. Ewan Birney11
  1. 1University Program in Genetics, Duke University, Durham, North Carolina 27710, USA; 2National Research Council of Canada, Plant Biotechnology Institute, Saskatoon, SK S7N OW9 Canada; 3AlgoNomics, B 9052 Gent, Belgium; 4Department of Plant and Molecular Biology, University of California, Berkeley, California 94720, USA; 5Affymetrix, Inc., Emeryville, California 94608, USA; 6Open Bioinformatics Foundation, Somerville, Massachusetts 02144, USA; 7Integrated Functional Genomics, IZKF, University Hospital Muenster, 48149 Muenster, Germany; 8The Welcome Trust Sanger Institute, Welcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK; 9Department of Computer Science, Washington University, St. Louis, Missouri 63130, USA; 10Genomics Institute of the Novartis Research Foundation (GNF), San Diego, California 92121, USA; 11European Bioinformatics Institute, Welcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK; 12Agriculture and Agri-Food Canada, Saskatoon Research Centre, Saskatoon SK, S7N 0X2 Canada; 13Berkeley Drosophila Genome Project, University of California, Berkeley, California 94720, USA; 14Cogina, New York City, New York 10022, USA; 15Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA; 16Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; 17Institute of Molecular and Cell Biology, 117609 Singapore

Abstract

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.

[Supplemental material is available online at www.genome.org. Bioperl is available as open-source software free of charge and is licensed under the Perl Artistic License (http://www.perl.com/pub/a/language/misc/Artistic.html). It is available for download at http://www.bioperl.org. Support inquiries should be addressed to bioperl-l{at}bioperl.org.]

Footnotes

  • 18 Present address: Genomics Institute of the Novartis Research Foundation (GNF), San Diego, California 92121, USA.

  • 19 Corresponding author.

  • E-MAIL jason.stajich{at}duke.edu; FAX (919) 681-1035.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.361602.

    • Received May 4, 2002.
    • Accepted August 9, 2002.
| Table of Contents

Preprint Server