Computer databases, networks, and software tools are essential resources for all aspects of genome analysis, as alluded to or discussed explicitly both in previous chapters of this manual and in other books from this series. The popular, but imprecise term bioinformatics is used to describe a spectrum of methods and activities from laboratory information management systems through data analysis, interpretation and integration, document preparation, and electronic publishing by way of submitting sequence and mapping data to public databases (Boguski, 1994). In this chapter, we will touch on all of these topics but focus most of our attention on the analysis and annotation of DNA and protein sequence data.
The genomic sequences of several prokaryotes genomes and eukaryotic chromosomes have already been completed. In addition, expressed sequence tag surveys will continue to yield comprehensive sets of transcripts for humans and many model organisms such as the mouse. Because of developments such as these, biologists will need to have more than a passing acquaintance with sequence analysis and annotation methods. We will not attempt to review the many excellent commercial sequence analysis programs that are available in the course of this chapter. Instead, we will concentrate on selected databases and software tools that are freely available in the public domain.
As bioinformatics is a field in flux, with new techniques continuously being developed, a certain consequence of a chapter such as this one is that listed Internet addresses or Web sites will no doubt change. Given this, the publishers have decided to make this chapter available in electronic format over the World Wide Web at http://www.cshl.org:80/books/g_a/bk1ch7/. If, in the course of using the printed version of this chapter, an invalid address or site is encountered, please refer to the electronic version for updated information.