One of the books that recently came out of this new focus is “Developing Bioinformatics Computer Skills“. This text is written by Cynthia Gibas and Per Jambeck primarily for the bioscientist or bioscience student without a background in computer science, programming or the theoretical constructs that data driven computation is based upon. I figured that this described me pretty well and like most bioscience folks getting into informatics, I had no formal training in bioinformatics or computer science. However, I am computationally literate and have long used computers as far more functional tools than simple word processing or data presentation machines. Furthering my anxiety at embarking on an investigation of this seemingly complex field, I had no real experience with databases or programming beyond BASIC, Pascal and a little IDL code. However, this text functions as an introduction to bioinformatics assuming the reader has had some coursework in chemistry and basic molecular biology, but no real programming experience, (well, perhaps a little).
The first chapter functions as an introduction, defining the field and introducing the rest of the text while the second chapter reviews some basic molecular biology and quickly covers the molecular and genetic applications that created the need for a more structured study of biological data and its interactions. Unfortunately there are some errors in the review section on molecular biology, but they don’t detract significantly from the message. The next three chapters concern the computational environment or workstation upon which the rest of the book will be based with an introduction to the UNIX OS environment and some basic commands used to navigate your typical UNIX system. The text has an obvious Linux bias, but that should not prevent OS X users from using the same commands within the terminal application of OS X or even downloading and working with programs discussed in later chapters of the book. In fact, all the caveats discussed in the book with respect to Linux are resolved with OS X further enhancing the prospect of using a Macintosh as the ideal bioinformatics workstation platform. Availability of good Linux laptops can also an issue and I will tell you an iBook makes for one sweet portable bioinformatics workstation as long as its running OS X. I’ve covered why OS X makes a better alternative than Linux or other operating systems in the previous article on bioinformatics.
The rest of the book gets into the actual meat of the typical tools used for work in biological informatics including web-based tools such as PubMed and Genbank among others that everyone in the biosciences should be familiar with. Work with DNA including sequence analysis, alignment searches and database queries are also covered. Chapter 9 then discusses what is near and dear to the book author’s hearts, protein structure and properties of protein structure while chapter 10 bravely discusses how to engage in the black magic of predicting protein structure and function based upon the primary sequence. Most of chapter 11 concerns more tools for genomics and proteomics research, but concludes with a brief but fascinating discussion of biochemical pathway databases. I say fascinating because while the discussion is brief, it is important as most of the tools for traditional bioinformatics research are centered around genomics and proteomics due to ease of implementation. However, small groups of folks are beginning to examine how to implement other types of data such as histology, pharmacology and other techniques into an analytical approach that can be integrated into the rest of the bioinformatics paradigm. This need to further integrate other methodologies will become even more important in a post-genomic world where, after the identification and precise sequencing of genes, we will need to figure out what functions (if any) those genes have and when (and if) they might be active. Determining this will require a new renaissance of older more established techniques such as physiology and anatomy that have been overlooked with the glamour and rise during the past two decades of molecular biology. These data formats that physiological and anatomical techniques generate are not easily integrated with traditional bioinformatics, so I am pleased to see at least some discussion (albeit brief) of some of these issues in an introductory text.
The last portion of the book then considers databases and the visualization of information contained in databases. Chapter 12 introduces Perl and describes why it is important and useful to bioinformatics. Automating data analysis with Perl is then discussed followed in chapter 13 by instruction in the design and development of databases. This section like any other chapter in the book could generate volumes of texts, but databases in particular can be cumbersome, difficult to learn and determine how to approach for ones particular data type. However this text does an admirable job given its scope. Finally, chapter 14 introduces the reader to data mining techniques and data visualization.
All in all, the text is well written, properly organized and a worthwhile investment for those interested in the field from a purely academic sense or applied. Furthermore, a number of universities are attempting to start programs in bioinformatics (including mine), and this would be a fine text for a class in introductory bioinformatics. For the OS X user, problems with the text are few and while the text was written with Linux in mind, most of the programs cited are available or are being actively ported to OS X. I would like to see future revisions of the text remedy some of the molecular biology deficiencies, include more discussion of the integration of non-traditional data into bioinformatics and, given my obvious biases in choice of computing platforms, address the use, advantages and implementation of OS X.