Apple and Bioinformatics

6-21-02 Bryan William Jones

Steve Jobs, during the introduction of the Xserve, specifically mentioned the bioscience market as one of the areas Apple is now focusing on. Despite the current economic downturn in the biotech market, many folks are talking up bioscience as the next big growth market for the coming decade, and the payoffs could be absolutely astounding with some short term estimates of the market reaching $2 billion within the next couple of years and long-term payoffs are expected to be astronomical. I expect that the big money will occur most likely in agribusiness first, but science related to the human condition will certainly get most of the press. So, how does Apple Computer fit into bioscience outside of investment portfolios? The answer is bioinformatics. What is bioinformatics you ask? In a nutshell, bioinformatics is ultimately the study of biological processes, how they interact, how they function and how they would be predicted to function given enough data. A Grand Unified Theory of bioinformatics if you will. A less grandiose and perhaps less precise definition is that bioinformatics is really about the study of biological information and the management of that information.

Bioinformatics really started to come together about 15 years ago, but it is still an emerging field at the nexus of computer science, chemistry, genetics, physiology, anatomy, pharmacology, epidemiology, medicine, engineering, virology, microbiology, pathology, mathematics, statistics, information management and molecular biology. All of these fields are contributing rapidly exploding amounts of information and making sense of all of this information and communicating the results is what bioinformatics is all about. In fact, because of the advent of bioinformatics, new fields like pharmacogenetics, or the tailoring of drug treatments to specific individuals are being created.

For a number of reasons, the major contributor of information to the field right now is genetics. This is because genetic and protein sequences lend themselves to large-scale analysis in a much easier to encode manner and thus are easier to interpret than other types of data such as tertiary or quaternary protein structure, histology or biochemical pathways. This ease of management is because genetic sequences can be easily represented by letters which represent base pairs, and what one sees when examining a genetic sequence from DNA for example is an endless stream of the letters A, G, C, and T representing adenine, guanine, cytosine, and thiamine, the four nucleotides that make up DNA.

You are probably familiar with the relatively recent announcement of the human genome being sequenced. What has been accomplished here is that all of the A?s, T?s, G?s and C?s in the human genome have been placed in roughly the appropriate places by a consortium of both private and public research groups. What all of these base pairs mean is another set of problems that needs to be unraveled. For instance, where do genes begin and end in these sequences of letters? What regions encode proteins and which do not? These are only two of literally millions of questions that can now be asked illustrating that this rough draft is only the beginning and it is only one genome out of many that has been sequenced which will lead to a more complete understanding of organismal biology.

Other genomes sequenced to date include those within the kingdoms archaea, bacteria and eukaryote with significant rewards coming out of each of them. For example, aside from the benefit of a basic science understanding of biology and biological mechanisms, applied benefits or potential rewards from the study of archaea could lead to novel new enzymes. The payoffs for bacteria include an understanding of pathologic mechanisms important in human illness, understanding of photosynthetic mechanisms important for ecology and agribusiness, the development of novel enzymes, and industrial production of drugs among other uses such as creating bacteria that can scavenge oil spills and better deal with other environmental catastrophes. Some of the eukaryotes sequenced to date include yeast, the worm c. elegans, and the human genome. The benefits of seqencing the yeast genome include a better understanding of the nature of inheritance, and in addition to its importance in the creation of certain beverages, this organism is useful as a powerful tool to understand biological problems such as its use in assays like the yeast two-hybrid system used by Myriad Genetics. The c. elegans model like d. melanogaster are model systems for understanding all aspects of biology including genetics, physiology, and development among others. While work with the genomes of a variety of other systems including the human, mouse, rat, and zebrafish will all contribute to a better understanding of normal and abnormal biology, perhaps leading to cures for pathologies such as cancer and inherited blindness. This discussion of course has not even taken into account the benefits of sequencing genomes such as rice, beans and corn among others for the agribusiness sector.

Now that we have a superficial overview of what bioinformatics is, where does Apple fit into this and why are they interested in bioscience? Traditionally, the companies that have been important to bioinformatics have been organizations like Agilent, Compaq (now merged with HP), IBM, SGI, and Sun. All of these companies have products including hardware and software that have been used by bioinformatics scientists and the common feature of much of this has been the operating system which turns out to be UNIX of various flavors. Recently however, Linux has been making inroads into many areas of science and folks in the bioinformatics communities have not ignored the potential of an open-source OS that can run on less expensive hardware. Read ?Developing Bioinformatics Computer Skills? by O?Reilly for a well written, concise (admittedly Linux centric) introduction to the basics of bioinformatics and the use of computational resources to solve problems and stay tuned for a review in the coming weeks.

However, given the advent of OS X, its open source component Darwin, and now the Xserve, Apple has become a viable alternative for many scientists with bioinformatics research being an ideal market for the integration of Apple hardware and software allowing for more efficient and productive work with lower costs, leading to a better return on investment for industry, the investor and the taxpayer.

In the past, as a long time user of the Macintosh in the sciences, it has been frustrating to me that in order to get my work accomplished, I was always having to switch back and forth between my Macintosh, a Windows machine and UNIX boxes of various flavors cramping my workspace with multiple CPUs and monitors. Additionally the cost of needing to maintain current Wintel, SGI and Macintosh hardware and software at all times was a drain on the budget. With the advent of OS X, being a true UNIX, I started to become interested in the possibility of a Macintosh being able to replace many of these other systems thus simplifying the work flow, clearing up much needed space and allowing me to perform my research in one computational environment. I should note that cost considerations were also important given that a current SGI O2 will cost upwards of $20,000 and an SGI Octane will run anywhere from $25,000 to $40,000 and up, with yearly maintenance contracts costing from the hundreds to thousands of dollars. For this price, I can equip entire labs with Macintosh computers running OS X.

The beauty of having a true UNIX as the Mac OS is that there is a huge amount of UNIX code out there for just about any problem in the sciences that one can imagine. In many cases getting it to run on the Mac in as easy as a recompile. However, even though porting many UNIX applications to OS X is fairly trivial, not everyone is competent or comfortable in porting UNIX code to OS X. To remedy this issue, there are a number of individuals and groups that are actively porting applications to OS X at a prodigious rate. Some, like OpenOSX are bringing all of these UNIX applications to the Mac in Macintosh style with easy to use double click installers. As for bioinformatics applications, a number of programs critical to bioinformatics have been ported recently by William Van Etten of Blackstone Computing and there is even a CD prepared by the Catapult Consortium that contains several popular programs, already compiled and pre-packaged for easy installation by the OS X Installer utility. For those familiar with such applications, these include:

AGNCBI: A version of NCBI’s toolbox optimized for the G4 CPU.

ClustalW: Performs multiple genetic sequence alignment.

EMBOSS: An entire suite of applications including apps for sequence alignment, rapid database searching with sequence patterns, protein motif identification including domain analysis, nucleotide sequence pattern analysis, codon usage analysis for small genomes, rapid identification of sequence patterns in large scale sequence sets, presentation tools for publication, And much more.

FASTA: Compares a protein sequence to another protein sequence or to a protein database, or a DNA sequence to another DNA sequence or a DNA library.

HMMER: Profile hidden Markov models for biological sequence analysis

PHYLIP: A package of programs for inferring phylogenies

Primer3: Primer3 picks primers for PCR reactions, considering as criteria: oligonucleotide melting temperature, size, GC content, and primer/dimer possibilities, PCR product size, positional constraints within the source sequence, and miscellaneous other constraints.

Wise2: Wise2 is a package specialized in comparisons of DNA sequence at the level of their conceptual translation, even if that DNA has either introns or sequencing error present.

Thus, with the Macintosh, scientists can run our UNIX code originally written for SGI, Sun, IBM, Compaq etc? right alongside general productivity software such as Adobe Photoshop, Microsoft Office, your email application of choice, statistics software, molecular modeling software, GIS software, your web browser of choice etc?etc?etc? AND you get true plug and play compatibility with the traditional Macintosh ease of use which are things sorely lacking in other variants of UNIX.

Along with OS X and a G4 workstation with a flat panel monitor or two, the introduction of the Xserve is the other critical solution to many needs in the sciences. We now have a powerful 1U server with very compelling features at an attractive price point. The Xserve has out of the box support for clients of all flavors including Linux, UNIX, Windows or Macintosh with unlimited user access. In other words, there is no user tax like with other solutions such as Microsoft licensed servers. Additionally, with the Xserve, if your research is CPU intensive, you have the option of dual G4?s, if your research is throughput intensive, you have throughput exceeding 500 Mb/s, and if your work is storage intensive, you can cram almost half a terabyte into a 1U space. Furthermore, there are some really nice touches such as Firewire ports on the front and back, dual USB ports and for those who prefer managing via serial console, there is even accommodation made for you.

All in all I think that Apple has given scientists and bioinformatics researchers in particular a set of powerful and convenient tools with which to perform their work, and I look forward to Apple getting back into the sciences allowing more work to be accomplished for less money and in better style.

Leave a Reply Cancel reply