Why Mesquite was made
We give two answers, the practical and
the poetic, and a comment on the relationship
between MacClade and Mesquite.
The practical answer
Mesquite represents a new approach to computing for evolutionary
biology. In recent years there has been a proliferation of computer
programs for phylogenetic analysis, each designed for some particular
analysis (e.g., see Felsenstein's
compilation of programs). As these often involve unique file
formats and user interfaces, it is difficult for users to move
from one to another. Users tend to become constrained to a few
familiar analyses, since any given program can't do everything,
and each program has costs in learning. As a programmer one would
like to respond by making a program that does everything, but
there are now too many analyses available or conceivable for a
single programmer or programming team to keep up. We have seen
the impact of these constraints with MacClade: some users perform
particular analyses in MacClade not because they are the most
appropriate analyses for their questions, but simply because they
are available in a familiar program. We would like to add more
flexibility to MacClade, but in a monolithic program this can
be difficult to do, and even if easy, there are more proposed
methods than we could maintain in MacClade.
Hence, our goal was to design a general system for phylogenetic
computing to which different programmers could contribute modules.
Bringing different analytical tools into a common system increases
possible analyses more than additively. In the end, the system
has grown beyond being strictly phylogenetic, including capabilities
for calculations involving characteristics of many organisms (e.g.
population genetics and morphometrics) that need not involve phylogeny.
A second goal of Mesquite is to provide a graphical user interface
that will operate, more or less without modification, under different
operating systems (being written in Java).
Modularity and Flexibility
"Modularity" in computer progamming might follow different
models. It could follow the "Mr.
Potato Head" model, in which there is a central program
to which different peripheral calculations can be attached in
specific places. This allows useful, but limited, flexibility.
Or, modularity could follow the "Lego"
model, in which building blocks are attached to other building
blocks, and so on indefinitely. This allows nearly unlimited flexibility.
Mesquite's modularity is somewhat of a hybrid between these: there
is a (small) central starting point to which modules attach, but
from there modules can be attached to modules attached to modules,
indefinitely, leading to considerable flexibility in the analyses
that can be constructed.
To give an idea of the flexibility, consider the calculation
of the parsimony score of a tree, the treelength. A treelength
calculating module takes as input information a tree, and responds
by returning its length. Such a module belongs to the general
class of modules that return a number when passed a tree. Other
modules belonging to this class ("NumberForTree") could
return the likelihood of the tree, or a measure of the asymmetry
of the tree's branching, or a measure of the tree's discordance
with a containing species tree. A Tree Legend module can be written
(and has been) that displays the treelength in a legend in the
tree window, but the Legend module is designed so that the user
can choose to display any other number for the tree, such as its
likelihood, asymmetry, or discordance. If a programmer creates
a new module to calculate a number for a tree such as the longest
branch-length path from root to tip, and a user installs the module,
then the longest path measurement would automatically become another
option for the tree legend.
The Tree Legend is not the only place where analyses could use
numbers for trees. A charting module could display the numbers
calculated for a whole series of trees, or a tree search module
could use the numbers to find a tree with minimum or maximum values
for the number. When such modules are made, they can automatically
have access to whatever NumberForTree modules are available. Thus,
the chart could show treelength, or likelihood, or asymmetry,
or discordance, or longest path. Likewise, the tree search module
could seek to optimize any of those. If a programmer makes a new
module to analyze numbers for trees, then suddenly all existing
NumberForTree modules have a new context in which they can be
analyzed. If a new NumberForTree module is made, it will appear
as a new option under each of the modules making use of NumberForTree.
Hence the number of alternative analyses rises as the product
of numbers of modules of different interacting types.
Of course, the trees used had to come from somewhere. One module
might supply the trees stored in a file, another might simulate
trees using a simple markovian model of speciation and extinction,
another might simulate trees as gene trees coalescing within a
species tree. Characters likewise might come from a stored matrix,
or might be simulated by a stochastic module of evolution, or
might represent reshufflings of existing characters. This means
that any calculations using trees or characters can either do
their calculations on observed data and reconstructed trees, or
can derive null distributions under stochastic models. The calculations
don't have to do anything special to achieve this flexibility;
they simply let the user choose the sources of trees and characters.
(For more details about modularity, see How
Mesquite works)
A community of programmers
Our hope is that building-block style of the Mesquite system
will encourage programmers to write modules for their own favorite
analyses. Another attraction of the Mesquite system is that many
of the details of reading and writing of files, user interface
and graphical display are already taken care of, and the programmer
might worry only about a single calculation. The system is built
in Java and is therefore platform independent. It is also possible
for programmers to link in code written in C, C++, or some other
language.
We have attempted to design the system so that a programmer's
efforts can be recognized as an independent, citable contribution.
Modules or suites of modules can have their own names, own manuals,
be distributed and cited separately. They simply run within the
Mesquite system.
Mesquite source code is available for download.
This allows other programmers to modify existing source to create
new modules.
The poetic answer
The goals of Mesquite are these:
To change the economics of imagination in evolutionary biology
There are three ways we envision Mesquite stimulating imaginative
ideas and their successful spread:
- Stimulating the creation of ideas: analyses. With multiple
alternative modules available for various parts of an analysis,
and with modules specializing in questions from various branches
of evolutionary biology (e.g., phylogenetics, molecular evolution,
population genetics, geometric morphometrics) the diversity
and scope of analyses that can be constructed by combining different
modules is great. Individual users can carry their imaginations
through to an analysis that no one has tried previously. Indeed,
Mesquite, by offering the alternatives to be combined, doesn't
merely provide analytical tools for questions that have existed:
it suggests and provokes new questions.
- Stimulating the creation of ideas: biology. As does
MacClade, Mesquite has an
emphasis on visualization and exploration. An idea whether
a particular hypothesis about the evolutionary history of a
group, or a stochastic model of a process can be followed
through to its consequences, and visualized. A biologist can
ask "What if this were the phylogenetic tree?" and
a character's evolution can be reconstructed or simulated on
this tree, and the results visualized. A biologist can ask "What
if the population had population sizes fluctuating in this way?",
and coalescence can be simulated, and the results visualized.
In providing users with the tool to ask "What if?"
questions, Mesquite provides an extension of the imagination.
Such tools are vital in a field whose ideas have consequences
that are difficult to predict or grasp without the aid of a
computer.
- Enhancing the efficient distribution of ideas: programs.
The imagination of theoreticians and programmers has produced
many valuable ideas for approaches and methods, and many valuable
programs to implement them. However, some of the ideas haven't
been translated to programs, and many of the programs haven't
been as much explored and used as would have been good. We don't
know, as a field, how many important ideas will lie unused for
decades until they are rediscovered. By allowing the programmer
to focus on the precise idea proposed (Mesquite providing much
of the housekeeping code for the programmer), Mesquite may allow
some ideas, that might never have been implemented, to be realized
as tools. By providing a fairly user-friendly context in which
modules can operate, Mesquite may encourage some programs to
be used more broadly and more easily than otherwise.
To continue to promote a phylogenetic perspective in evolutionary
biology The last few decades have seen the realization
of the importance of viewing organismal diversity and evolution
in the light of phylogeny. This revolution is analogous to and
as fundamental to its field as the revolution in cosmology from
a Newtonian view of space to an Einsteinian view of space (Maddison
and Pérez, 2000). Just as mass curves space, phylogeny
has curved the space of biological diversity, providing a distortion
on the distribution of traits of organisms we see around us. MacClade
and Mesquite are both designed to provide a corrective lens, to
help us to see organisms and their traits in their natural orientation
within this curved space along the phylogeny. Mesquite's modularity
allows this perspective to be extended to fields such as morphometrics,
in which a phylogenetic perspective has relatively recently begun
to suffuse the field.
Which to use, Mesquite or MacClade?
Version 4 of MacClade (Maddison
& Maddison, 2000) was released in October 2000, and the MacOS
X compatible version 4.04 in July 2002. (It is now at version
4.08.) The reader might wonder why we have been working on two
different programming efforts, and whether they are intended
for
different uses. Although Mesquite's extensibility means that
it has many more features than MacClade, MacClade has some features
that Mesquite lacks. Calculations and functions of MacClade's
tree window not currently available in Mesquite include particular
charts (e.g., Changes and Stasis), the concentrated
changes test, some of the parsimony options (irreversible, stratigraphic,
Dollo), a detailed Trace All Changes mode, and some options for
tree printing (e.g., saving a tree to the clipboard). Previous
to version 2.5 of Mesquite, some of the most significant advantages
of MacClade 4 over Mesquite were in tools in the data editor
for
managing molecular sequence data; these have, however, been added
to Mesquite 2.5. MacClade also
has
a simpler interface, which some users might prefer. We imagine
that in the future MacClade will give way to Mesquite
as Mesquite matures. For the next while, however, the
two will coexist and be complementary.
References
Maddison, D.R. and W.P. Maddison. 2000. MacClade version 4:
Analysis of phylogeny and character evolution. Sinauer Associates,
Sunderland Massachusetts.
Maddison, W.and T. Pérez, 2000. Biodiversidad y lecciones
de la historia. In: Enfoques contemporáneos para el estudio
de la biodiversidad [Hernández, H.M., A. García
Aldrete, F. Álvarez and M. Ulloa, editors]. Instituto de
Biología, UNAM, Mexico. Pp. 201-220.