Skip Navigation

Systematic Biology 2008 57(4):658-660; doi:10.1080/10635150802303458
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Morrison, D. A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Morrison, D. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 Society of Systematic Biologists

Phylogenetic Trees Made Easy: A How-to Manual, third edition.—Barry G. Hall. 2008. Sinauer Associates, Sunderland, Massachusetts. xiv + 230 pp. ISBN 978-0-87893-310-5. $US39.95 £24.99 (paperback)

David A. Morrison

Department of Parasitology (SWEPAR), National Veterinary Institute and Swedish University of Agricultural Sciences 751 89 Uppsala, Sweden; E-mail: David.Morrison{at}bvf.slu.se

Phylogenetic Trees Made Easy: A How-to Manual, third edition.—Barry G. Hall. 2008. Sinauer Associates, Sunderland, Massachusetts. xiv + 230 pp. ISBN 978-0-87893-310-5. $US39.95 £24.99 (paperback).

Giving a book this title is asking for trouble—phylogenetic analysis is one of the hardest things that a biologist can try to do (because there can be no manipulative experiment to test the conclusions from the study of unique historical events), and so claiming that this process can be made "easy" seems very like a contradiction in terms. It comes as no surprise, then, that the two previous editions of this book (2001, 2004) received reviews that varied from enthusiastic (Van de Peer, 2001; Nepokroeff, 2002; Fitch, 2003; Smith, 2003) to noncommittal (Meyer, 2001; Elgar, 2002; Smith, 2005) to disappointed (Wright, 2002; Kumar, 2003; Ronquist, 2003; Igic, 2005) to downright critical (Macey, 2002; Grant et al., 2003; Specht and Stevenson, 2003). A look at the third edition of this book is thus a worthwhile exercise, to see whether anything has changed with the various revisions.

Sadly, this review is not going to be, in essence, any different to those of the latter two groups. It is a pity to criticize a book like this, because what it tries to do it does rather well. It covers many relevant topics in a readable manner, the ideas are accessible, and the instructions are easy to follow. Ultimately, however, the book is far too uneven for me to claim that it is a particularly good one. It targets a specific group of researchers, and that target audience may see some value in it. Readers of this journal are not likely to be members of that audience.

I venture to suggest that there are three basic attitudes to phylogenetic analysis, which I will label as bioinformatic, protocol, and phylogenetic. The bioinformatic approach sees the analysis as an investigation of multivariate patterns in a data set, producing a tree as one means of representing those patterns. The focus is on the computational aspects of "statistical methodology," particularly algorithms and computer programs. The protocol approach sees the analysis as a component of a larger project, with the tree as simply one tool (of many) for investigating evolutionary patterns (often focusing on gene trees). The spotlight is on having an adequate protocol for the data analysis, analogous to the protocol used for laboratory work. In contrast, the phylogenetic approach sees the analysis as an end in itself, with the tree being a "true" representation of actual evolutionary history, rather than merely a stick diagram representing mathematical patterns. The focus is on getting the best tree possible (of the species not the genes), with full understanding of the value that almost always comes from taking extra time and care.

The distinction between these three attitudes is not trivial. For example, I have recently conducted a review of sequence-alignment practices in 26 journals covering the areas of phylogenetics, systematics, evolutionary biology, molecular biology, and microbiology. I found major differences in the practices between these areas, in spite of the fact that the boundaries between the areas are very hazy. In particular, the first two areas fit neatly into the phylogenetics category, whereas the latter three fit firmly into the protocol category.

Barry Hall's book is aimed squarely at the protocol group (he actually calls his book a "cookbook"). There is not enough computational information to interest the bioinformatics group and not enough consideration given to the uniqueness of each data set to satisfy the phylogenetics group. Most of the (many) books aimed at the bioinformatics group either treat phylogenetic analysis as one (possibly minor) aspect of a much broader class of analyses for molecular data (e.g., database searching, molecular modeling, etc.), or they treat phylogenetic analysis from a detailed algorithmic perspective. There are not many books aimed at the phylogenetics group (i.e., constructing species trees not gene trees), which is a pity.

For the protocol group, a phylogenetic analysis is usually part of a pipeline, with the sequence data going in at one end and a publishable analysis coming out at the other end. Ideally, this pipeline would be automated and fast. However, it is difficult to control the quality of the output if there is no manual quality control of the processing. High-quality phylogenetic analyses are not easy, because each data set has its own array of characteristics, and these characteristics must not be allowed to cause problems in the final output. Phylogenetic analysis requires careful thought and a great deal of understanding, if it is going to be effective. This is hard to reconcile with a fast, automated pipeline for analysis.

Barry Hall clearly cares about quality, but he does not make it the focus of his book, the focus instead being on speed and ease of use. The motivation is to get the reader started on phylogenetic analysis of a gene, not to produce analyses that will stand the test of time. The majority of the space in the book is devoted to screenshots from the computer programs, illustrating the recipes used in the tutorials, with little consideration of how to decide which recipe might be the one needed. If readers follow the recipes in the book and can adapt those recipes to their own data, then it is clear that they will get a tree based on each of the methods discussed. How good those trees will be for any particular user's data set is not so clear.

Introductory books are hard to write precisely because the answers to the key philosophical questions are never obvious. How much of a protocol is someone supposed to truly understand before they can be expected to use the protocol competently; and when in the process are they expected to acquire that understanding? Barry Hall has chosen his answers and, after three editions of his book, they are presumably carefully chosen answers. Unfortunately, reviewers of the previous editions have not always been impressed with those answers (e.g., Macey, 2002; Grant et al., 2003; Specht and Stevenson, 2003), and the answers have not changed much in this edition. Hall communicates the bare minimum of information possible for constructing a tree.

When dealing with data analyses, biologists sometimes produce a rather time-worn analogy with driving a car (rather than with cooking): in the same way that it is possible to safely and effectively drive a car without knowing very much at all about how the car works, it is possible to successfully use a computer program and a protocol to analyze scientific data without much knowledge about how the data analysis actually works. Unfortunately, this analogy has one fatal flaw: most car drivers know enough about cars to be able to recognize when their car is not working, or at least not doing what they want. For example, you don't need to know how the steering works in order to recognize that you have ended up at the train station when you were trying to drive to the supermarket.

In contrast, data analyses can produce complete nonsense without the user realizing it at all, if the analyses are not performed in an appropriate manner; and computerized analyses can produce such nonsense at a very fast rate indeed. The only effective way to recognize that an analysis has produced rubbish is by knowing how it should work in the first place—just because a program has produced a tree doesn't mean that the tree is a worthwhile representation of a phylogeny. It is a pity that so many biologists seem to put a lot of time and effort into collecting high-quality data (because they understand their laboratory protocol) and then analyze it in an unsuitable manner (because they do not understand their data analysis protocol). Data analysis is an iterative process of calculation and thought, not a pipeline.

So, a book based on protocols has an inbuilt limitation. If the car analogy is of any use at all, then it emphasizes that using a "cookbook" is like teaching someone to drive using a simulator rather than a real car. This book will get you started on phylogenetic analysis, but it does not tell you how much more there is to do. This is both its greatest strength and its greatest weakness. Ultimately, this duality results in the situation where each important topic is introduced but not enough understanding is provided to create competent program users. This is what generates the obvious unevenness in the book. Recipes are very useful if you simply want to repeat an analysis of your own, but they are not a good way to teach data analysis—recipes best document what you did, rather than what someone else should do.

One simple example will suffice to illustrate the unevenness. It is good for the book to have an appendix about data formats for different computer programs, because this is often the biggest frustration for both beginners and experts. Problems can arise in programs from: (1) differing interpretations of a format; (2) incompatible versions of a format; (3) incomplete implementations of a format; (4) differences between reading and writing a format; (5) poor interconversion of particular formats; and (6) incompatible characters as line-endings. Barry Hall's book rarely mentions aspects of problems 1 to 5 and only explicitly addresses problem 6, which may be the least frustrating one encountered by users (because it is relatively easy to deal with). Thus, the important issue of data formats is introduced but is not covered in an effective manner. The rest of the book suffers the same type of problem—the choice of topics to be discussed and their depth seem somewhat arbitrary.

So, as far as the underlying philosophy is concerned, there is little difference between this edition and its predecessors (Zwickl, 2008). Indeed, very few of the criticisms of the previous editions seem to have been addressed in this edition. There isn't even a list of review articles and books, as an entrée into the rich literature on theory, which would have gone some way towards deflecting many of the previous criticisms.

On the other hand, the most obvious difference between the three editions is the computer programs that are used to implement the protocols. The first edition focused on Apple Macintosh computers, whereas the second edition broadened the scope to include Windows and Unix PCs. This new edition focuses on Windows PCs, with the expectation that the other two user groups will adapt to this situation (such as by using the Wine API, which is a long way short of usable on a Macintosh). So, the MEGA program is used for most of the analyses, along with PhyML for maximum likelihood and MrBayes for bayesian analysis (the latter being the only hangover from the first two editions). Other programs are briefly mentioned in an appendix, but with the usual unevenness regarding the amount of information given and its correctness.

The switch from PAUP* to MEGA is based on the grounds that the latter is now second only to PAUP* in its number of annual literature citations (Kumar and Dudley, 2007), and that PAUP* has not been updated since before the previous edition of the book. Sadly, the author's enthusiasm for MEGA ("The data acquisition, alignment, and tree drawing functions of MEGA are so elegantly implemented, and so easy to use, that although I am a confirmed Macintosh user I purchased a Windows computer for the sole purpose of using MEGA" [p. 3]) is symptomatic of the whole book, because MEGA allows a data set to pass through Clustal and Neighbor-Joining without any thoughts passing through the head of the user. Ease of use can be a very good thing in computing, but it should not come at the expense of thinking, which is what inevitably happens unless the user is guided through a thorough exploration of their data before they do the analysis. PhyML is a less obvious choice than MEGA, as it has been shown to be less successful than some of its competitors (e.g., Morrison, 2007).

The closest competitor for this book is probably the one edited by Salemi and Vandamme (2003). The latter deals with many more computer programs, covers the theory in more depth, and includes a wider range of topics. It is not, however, a how-to tutorial, and thus cannot stand on its own as a practical guide to phylogenetic trees (Morrison, 2005). In this sense it does not compete directly with Barry Hall's book.

What Hall's book does it does well; it just does not do enough. It is a primer for those who want to dabble in phylogenetic analysis, successfully providing a start-up guide for those people who are "learning to drive." There is more to driving a car than merely starting it, however.


    References
 Top
 References
 

    Elgar G. [Book review.] Brief. Funct. Genom. Proteom. (2002) 1:107–108.[CrossRef]

    Fitch D. H. A. Recipe for success. Mol. Phylogen. Evol. (2003) 27:161–162.[CrossRef]

    Grant T., Faivovich J., Pol D. The perils of "point-and-click" systematics. Cladistics (2003) 19:276–285.

    Igic B. [Book review.]. J. Hered. (2005) 96:469–470.[Free Full Text]

    Kumar S. MacTrees made easy. Mol. Phylogen. Evol. (2003) 27:165–167.[CrossRef]

    Kumar S., Dudley J. Bioinformatics software for biologists in the genomics era. Bioinformatics (2007) 23:1713–1717.[Abstract/Free Full Text]

    Macey J. R. [Book review.]. Q. Rev. Biol. (2002) 77:196–197.

    Meyer A. Growing trees from molecular data. Science (2001) 294:2297–2298.[Free Full Text]

    Morrison D. A. [Book review.]. Syst. Biol. (2005) 54:984–986.[Free Full Text]

    Morrison D. A. Increasing the efficiency of searches for the maximum-likelihood tree in a phylogenetic analysis of up to 150 nucleotide sequences. Syst. Biol. (2007) 56:988–1010.[Abstract/Free Full Text]

    Nepokroeff M. Seeing the forest for the (gene) trees. BioScience (2002) 52:531–533.[CrossRef]

    Ronquist F. Molecular phylogenetics for dummies. Mol. Phylogenet. Evol. (2003) 27:163–164.[CrossRef]

    Salemi M., Vandamme A.-M. The phylogenetic handbook: A practical approach to DNA and protein phylogeny (2003) Cambridge, UK: Cambridge University Press.

    Smith U. [Book review.]. Syst. Bot. (2003) 28:465.

    Smith T. [Book review.]. Syst. Bot. (2005) 30:683.

    Specht C. D., Stevenson D. W. Easy trees? Mol. Phylogenet. Evol. (2003) 27:168–171.[CrossRef]

    Van de Peer Y. Phylogeny branches out. Nature (2001) 414:490.[Web of Science]

    Wright F. [Book review.]. Brief. Bioinformatics (2002) 3:429–431.[Free Full Text]

    Zwickl D. [Book review.]. Q. Rev. Biol. (2008) 83:98.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Morrison, D. A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Morrison, D. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?