|
|
PUZZLEMaximum likelihood analysis for nucleotide and amino acid alignments PUZZLE is an application to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood. It implements a fast tree search algorithm (quartet puzzling) that allows analysis of large data sets and automatically assigns estimations of support to each internal branch. Rate heterogeneity (invariable sites plus Gamma distributed rates) is incorporated in all models of substitution available (nucleotides: TN, HKY, F84, and submodels; amino acids: Dayhoff, JTT, mtREV24; nucleotide doublets: SH model). All model parameters including rate heterogeneity can be estimated from the data by maximum likelihood. PUZZLE also computes pairwise maximum likelihood distances as well as branch lengths for user specified trees. In addition, PUZZLE offers a novel method, likelihood mapping, to investigate the support of internal branches or the overall phylogenetic content of an alignment without computing an overall tree. PUZZLE is available free of charge from
Preferably, this documentation should be printed and read completely before using PUZZLE the first time. If you do not have the time to read this manual completely please do read at least the two sections Input/Output Conventions and Quick Start below. Then you should be able to use the PUZZLE program, especially if you have some experience with the PHYLIP programs. The other sections should be read at a later time, though.
Contents
Legal StuffThe PUZZLE software and its accompanying documentation are provided as is, without guarantee of support or maintenance. The copyright holders make no express or implied warranty of any kind with respect to this software, including implied warranties of merchantability or fitness for a particular purpose, and are not liable for any damages resulting in any way from its use. Everyone is granted permission to copy and redistribute this software package and its
accompanying documentation, provided that:
If you plan to redistribute PUZZLE on the Internet or on CD-ROM please obtain permission of the authors before. Permission to use portions of the source code will be granted on request.
InstallationThe PUZZLE software is distributed in different ways depending on the target platform (MacOS, MS-DOS, UNIX, VMS). All packages contain the documentation, example input files, and the source code including makefiles. MacOS and MS-DOS executables are distributed with the corresponding archives. Please follow the instructions suitable for your system. MacOSGet the file puzzle.hqx. After decoding this BinHex file (this is done automatically on a properly installed system, otherwise use programs like "StuffIt Expander" or ask your local Mac expert) you will find a folder called "PUZZLE" on your hard disk. This folder contains the four subdirectories "MANUAL", "EXAMPLES", "PROGRAM", and "SOURCES". The "MANUAL" folder contains the documentation for the PUZZLE program in plain text and HTML format. The "EXAMPLES" folder contains example input files. The "PROGRAM" folder contains the Macintosh 68k FPU and PPC executables. The 68k FPU version needs a numerical coprocessor and does not run on 68000 Macintoshes. Note, 68k Macs without FPU and PowerMacs will crash if you try to run the 68k FPU version on them. The PPC version is optimized for PowerMacs. The default memory partition is 3000K for both versions. If you get a memory allocation error during a PUZZLE run increase its memory partition with the "Get Info" command of the Finder. The "SOURCES" folder contains the ANSI C sources of PUZZLE. The MacOS executables have been compiled with Metrowerks CodeWarrior 11. The corresponding project files are found in the "SOURCES" folder. MS-DOSGet the file puzzle.zip. After decoding this ZIP file (this is done automatically on a properly installed system, otherwise use programs like "WinZip" or ask your local PC expert) you will find a directory called "PUZZLE" on your hard disk. This directory contains the four subdirectories "MANUAL", "EXAMPLES", "PROGRAM", and "SOURCES". The "MANUAL" folder contains the documentation files for the PUZZLE program in plain text and HTML format. The "EXAMPLES" folder contains example input files. The "PROGRAM" folder contains the MS-DOS executable "PUZZLE.EXE". This executable runs on all IBM PCs and compatible systems under MS-DOS. If a numerical coprocessor is present (which is strongly recommended!) the PUZZLE program will use it. The "SOURCES" directory contains the ANSI C sources of PUZZLE. The MS-DOS executable has been compiled with Borland Turbo C (version 3.0). If you want to recompile "PUZZLE.EXE" you can use the "MAKEFILE.TC" script in the "SOURCES" directory. The command MAKE -F MAKEFILE.TC will compile "PUZZLE.EXE" (if you are using Turbo C). UNIXGet the file puzzle.tar. If you received a compressed tar file (puzzle.tar.Z or puzzle.tar.gz) you have to decompress it first (using the "uncompress" or "gunzip" command). Then untar the file with tar xvf puzzle.tar The newly created directory "PUZZLE" contains four subdirectories called "MANUAL", "EXAMPLES", "PROGRAM", and "SOURCES". The "MANUAL" directory contains the documentation files in plain text and HTML format. The "EXAMPLES" directory contains example input files. The "SOURCES" folder contains the ANSI C sources of PUZZLE. Switch to this directory by typing cd PUZZLE/SOURCES If on your UNIX system the C compiler is called "cc" and if this compiler is ANSI C compliant by default (this should be the case on most machines) just type make and the executable "puzzle" is compiled and put into the "PROGRAM" directory. If your compiler is the GNU "gcc" compiler (GNU, Linux) type make -f makefile.gnu In the current version of "gcc" you will get precisely one warning message ("constant is so large that it is unsigned"), ignore this message, everything is OK. If you use a HP workstation type make -f makefile.hp The HP compiler needs the switch "-Aa" to be ANSI C compliant. If you have a problem to compile then your compiler or its runtime library is most probabably not ANSI compliant (e.g. old SUN compilers). In most cases you may succeed compiling if you change some parameters in the "makefile". It may also be that the ANSI C type "unsigned long int " is not implemented in the compiler with the range 0-4294967295. Then you have to change compiler settings in the "makefile" as well. Ask your local UNIX expert for help. If the compilation of the PUZZLE program succeeded without any problem but running the "puzzle" executable does not run the PUZZLE maximum likelihood program rename the executable to "puzzle31" as some UNIX systems come with a preinstalled game also called "puzzle". VMSIf you are upgrading from an earlier version please remove all files belonging to the old version before continuing. Please read the following section completely before installing the PUZZLE program. Get the file puzzle.uue. After decoding this uuencoded ZIP file (using programs like "UUD" and "UNZIP", ask your local VMS expert for help) you will see a new directory "PUZZLE" containing three subdirectories called "EXAMPLES", "MANUAL", and "SOURCES" on your computer. The "MANUAL" directory contains the documentation files for the PUZZLE program in plain text and HTML format. The "EXAMPLES" directory contains some example input files. The "SOURCES" folder contains the ANSI C sources of PUZZLE. Go to this directory by typing SET DEF [.PUZZLE.SOURCES] Then run the command file "MAKEFILE.COM" by typing @MAKEFILE This compiles the "PUZZLE.EXE" executable. A "PROGRAM" directory is created automatically as a subdirectory of the "PUZZLE" directory and the executable is moved into this directory. The PUZZLE program is now ready to run: RUN PUZZLE
IntroductionPUZZLE is an ANSI C application to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood. It implements a fast tree search algorithm (quartet puzzling) that allows analysis of large data sets and automatically assigns estimations of support to each internal branch. Rate heterogeneity (invariable sites plus Gamma distributed rates) is incorporated in all models of substitution available (nucleotides: TN, HKY, F84, and submodels; amino acids: Dayhoff, JTT, mtREV24). All parameters including rate heterogeneity can be estimated from the data by maximum likelihood. PUZZLE also computes pairwise maximum likelihood distances as well as branch lengths for user specified trees. In addition, PUZZLE offers a novel method, likelihood mapping, to investigate the support of internal branches without computing an overall tree.
Input/Output ConventionsSequence InputPUZZLE requests sequence input in PHYLIP INTERLEAVED format (sometimes also called PHYLIP 3.4 format). Many sequence editors and alignment programs (e.g. CLUSTAL W) save data in this format. Take a look at the three example files in the "EXAMPLES" folder ("globin.a", "marswolf.n", "atp6.a"), and you know how the sequence alignment should look like. The default name of the sequence input file is "infile". If an "infile" is not present PUZZLE will prompt the user for an alternative file name. Sequences names in the "infile" are allowed to contain blanks but these blanks will internally be converted to underscores "_". Sequences can be in upper or lower case, any spaces or control characters are ignored. The dot "." is recognized as matching character, it can be used in all sequences (of course not in the first sequence!). Valid symbols for nucleotides are A, C, G, T and U, and for amino acids A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, and Y. All other visible characters (including gaps, question marks etc.) are treated as N (DNA/RNA) or X (amino acids). The first sequence in the data set is considered the default outgroup. General OutputAll results are written to the file "outfile". If the option "show all unresolved quartets" is invoked a file called "outqlist" is created listing all these quartets. Distance OutputPUZZLE automatically computes pairwise maximum likelihood distances for all the sequences in the data file. They are written in the "outfile" and in the separate file "outdist". The format of "outdist" is PHYLIP compatible. Tree OutputThe quartet puzzling tree with its support values for the internal branches and with maximum likelihood branch lengths is plotted as ASCII drawing in the "outfile". The same tree is written into the "outtree" file. The output tree convention follows the one adopted by the CLUSTAL W team: the tree topology is described by the usual round brackets (a,b,(c,d)); and branch lengths are written after the colon a:0.22,b:0.33. To be able to display support values for each branch simultaneosly with branch lengths they are written as internal labels, i.e. they follow directly after each node before the branch lengths. Here is an example: (Gibbon:0.1393, ((Human:0.0414, Chimpanzee:0.0538)99:0.0175, Gorilla:0.0577)98:0.0531, Orangutan:0.1003); Note that the tree topology is always unrooted (no basal bifurcation). With TreeView and TreeTool it is possible to view this tree with its branch lengths AND simultaneously with the support values for the internal branches (here 98% and 99%). Note, the PHYLIP programs DRAWTREE and DRAWGRAM may also be used with the CLUSTAL W treefile format but at the current version (3.5) they simply ignore the internal labels and print only the tree topology with branch lengths. All these tree drawing programs are also able to process multifurcating trees (PUZZLE may return a multifurcating tree if the data does not resolve all nodes). Tree InputPUZZLE optionally also reads input trees. The default name for the file containing the input tree is "intree" but if you choose the input tree option and there is no "intree" present you will be prompted for an alternative name. The format of the input tree file is identical to the tree in the "outtree" file. However, it is sufficient to provide the tree topology only, you don't need to specify branch lengths (that are ignored anyway) or internal labels (that are read, stored, and written back to the "outtree" file). The input tree needs not to be unrooted, it can also be rooted. The corresponding the basal bifurcation will automatically be removed. It is important that sequence names in the input tree file do not contain blanks (use underscores!). The tree also needs not to be completely resolved, it can be multifurcating. The format of the "intree" file is easy: just write the tree into the file (there are no additional numbers indicating how many trees because PUZZLE reads only one tree). Likelihood Mapping OutputPUZZLE also offers likelihood mapping analysis, a method to investigate support for internal branches of a tree or the overall phylogenetic content of an alignment without computing an overall tree and to graphically visualize phylogenetic content of a sequence alignment. The results of likelihood mapping are written to the general "outfile" as well as to a file called "outlm.eps". This file contains in encapsulated Postscript format (EPSF) a picture of the triangle that forms the basis of the likelihood mapping analysis. You may print it out on a Postscript capable printer or view it with a suitable program. The "outlm.eps" file can be edited by hand (as it is plain text) or by drawing programs that understand the Postcript language such as Adobe Ilustrator.
Quick StartPrepare your sequence input file "infile" and, optionally, your tree input file "intree" as well. Then start the PUZZLE program. PUZZLE will choose automatically the nucleotide or the amino acid mode. If more than 85% of the characters (not counting the - and ?) in the sequences are from A, C, G, T, U or N, the sequence will be assumed to be nucleotide. If your data set contains amino acids PUZZLE recognizes whether you have amino acids encoded on mtDNA or on nuclear DNA, and selects the appropriate model of amino acid evolution. If your data set contains nucleotides the model of sequence evolution chosen is the HKY model. Parameters need not to be specified, they will be estimated by a maximum likelihood procedure from the data. If PUZZLE detects an "intree" file it automatically switches to the input tree mode. Then, a menu (PHYLIP "look and feel") appears with recommended options set. Ususally, the preselected options work well but you can still change all available options. If you want to incorporate rate heterogeneity you have to explicitly select this option ("w") as rate heterogeneity is switched off by default. Then type "y" at the input prompt and start the analysis. You will see a number of status messages on the screen during computation. When the analysis is finished output files (e.g. "outfile", "outtree", "outdist", "outqlist", "outlm.eps") will be in the same folder of the input files. To obtain a high quality picture of the output tree most conveniently, we recommend to
use the TreeView program by Roderic Page that is available free of charge and runs on
personal computers (Macintosh and MS-Windows). It can be retrieved from TreeView understands the CLUSTAL W treefile conventions, reads multifurcating trees and is able to simultaneosly display branch lengths and support values for each branch. Open the "outtree" file with TreeView, choose "Phylogram" to draw branch lengths, and select "Show internal edge labels". On a SUN workstation you can use the TreeTool program to display and manipulate PUZZLE trees (ftp://rdp.life.uiuc.edu/rdp/programs/TreeTool).
Models of Sequence EvolutionHere we give a brief overview over the models implemented in PUZZLE. If you are allergic against mathematics you can skip this section, though the information is actually very important for an understanding of the parameters used in PUZZLE. Models of SubstitutionThe substitution process is modelled as reversible time homogeneous stationary Markov process. If the corresponding stationary nucleotide (amino acid) frequencies are denoted pi_i the most general rate matrix for the transition from nucleotide (amino acid) i to j can be written as | Q_{ij} pi_j for i != j R_{ij} = | | - Sum_k Q_{ik} pi_k for i == j The matrix Q_{ij} is symmetric with Q_{ii} == 0 (digonals are zero). For nucleotides the most general model built into PUZZLE is the Tamura-Nei (TN) model. The matrix Q_{ij}for this model can be written | tau*kappa*4/(tau+1) for i -> j pyrimidine transition | Q_{ij} = | kappa*4/(tau+1) for i -> j purine transition | | 1 for i -> j transversion The parameter tau is called the "Y/R transition parameter" whereas kappa is the "Transition/transversion parameter". If tau is equal to 1, we get the HKY model (1985). Note, that there is a subtle but important difference between the transition-transversion parameter, the expected transition-transversion ratio, and the observed transition transversion ratio. The transition-transversion parameter simply is a parameter in the rate matrix. The expected transition-transversion ratio is the ratio of actually occuring transitions to actually occuring transversions taking into account nucleotide frequencies in the alignment. Due to saturation and multiple hits not all substitutions are observable. Thus, the observed transition-transversion ratio counts observable transitions and transversions only. If the base frequencies in the HKY model are in addition assumed to be homogeneous (pi_i = 0.25) HKY reduces further to the Kimura model. In this case kappa is identical to the expected transition/transversion ratio. If kappa is then set to 0.5 the Juke-Cantor model is obtained. The F84 model as implemented in the PHYLIP programs is a special case of the Tamura-Nei model as well. For amino acids the matrix Q_{ij}is fixed and does not contain any free parameters. Depending on the type of input data three different Q_{ij} are available in PUZZLE. The Dayhoff and JTT matrices are for use with proteins encoded on nuclear DNA, and the mtREV24 matrix is for use with proteins encoded on mtDNA. For doublets (pairs of dependent nucleotides) the SH model is implemented in PUZZLE. The corresponding matrix Q_{ij} reads | 2*kappa for i -> j transition substitution | Q_{ij} = | 1 for i -> j transversion substitution | | 0 for i -> j two substitutions The SH model basically is a F81 model for single substitutions in doublets. Models of Rate HeterogeneityRate heterogeneity is taken into account by considering invariables site and by introducing Gamma-distributed rates for the variable sites (Strimmer and von Haeseler 1997). For invariable sites the parameter theta ("Fraction of invariable sites") determines the probability of a given site to be invariable. If a site is invariable the probability for the constant site patterns is pi_i, the frequency of each nucleotide (amino acid). The rates r for variable sites are determined by a discrete Gamma distribution (Yang 1994) that approximates the continous Gamma distribution alpha alpha-1 alpha r g(r) = ------------------------ alpha r e Gamma(alpha) where the parameter alpha ranges from alpha = infinity (no rate heterogeneity) to alpha < 1.0 (strong heterogeneity). As alpha is very unconveniently scaled PUZZLE uses eta ("Gamma rate heterogeneity parameter") related to alpha by 1 eta = ----------- 1 + alpha A comparison of eta with alpha shows that eta is the more intuitive parameter:
In addition, the total rate heterogeneity (rho) of the "invariable sites + Gamma" model can then be written rho = theta + rho - theta rho
Options AvailableAll options available can be selected and changed after PUZZLE has read the input file. Depending on the input files options are preselected and displayed in a menu ("PHYLIP look and feel"): GENERAL OPTIONS b Type of analysis? Tree reconstruction k Tree search procedure? Quartet puzzling n Number of puzzling steps? 1000 u Show unresolved quartets? No o Display as outgroup? Gibbon a Parameter estimates? Approximate (faster) x Parameter estimation uses? Neighbor-joining tree SUBSTITUTION PROCESS d Type of sequence input data? Nucleotides h Codon positions selected? Use all positions m Model of substitution? HKY (Hasegawa et al. 1985) t Transition/transversion parameter? Estimate from data set f Nucleotide frequencies? Estimate from data set RATE HETEROGENEITY w Model of rate heterogeneity? Uniform rate Confirm [y] or change [menu] settings: By typing the letters shown in the menu you can either change settings or enter new parameters. Some options (for example "m" and "w") can be invoked several times to switch through a number of different settings. The parameters of the models of sequence evolution can be estimated from the data by a variety of procedures based on maximum likelihood. The analysis is started by typing "y" at the input prompt. The following table lists in alphabetical order all PUZZLE options. Be aware, however,
not all of them are accessible at the same time:
Other FeaturesFor nucleotide data PUZZLE computes the expected transition/transversion ratio and the expected pyrimidine transition/purine transition ratio corresponding to the selected model. Base frequencies play an important role in the calculation of these ratios. PUZZLE also tests with a 5% level chi-square-test whether the base composition of each sequence is identical to the average base composition of the whole alignment. All sequences with deviating composition are listed in the output file. It is desired that no sequence (possibly except for the outgroup) has a deviating base composition. Otherwise a basic assumption implicit in the maximum likelihood calculation is violated. A hidden feature of PUZZLE (since version 2.5) is the employment of an advanced weighting scheme of quartets (Strimmer, Goldman, and von Haeseler 1997) in the quartet puzzling tree search. PUZZLE also computes the average distance between all pairs of sequences (maximum likelihood distances). The average distances can be viewed as a rough measure for the overall sequence divergence.
Interpretation and HintsQuartet Puzzling Support ValuesThe quartet puzzling (QP) tree search estimates support values for each internal branch. In principle, these values have the same practical meaning as bootstrap values. Indeed, it turns out that PUZZLE gives you estimates of support that are even numerically very similar to corresponding neighbor-joining bootstrap values. This means that branches showing a QP reliability from 90% to 100% are very strongly supported. In principle one can of course also trust branches with lower reliability but in this case it is advisable to check how well the respective branch does in comparison to other branches in the tree (relative reliability). It is also important if you have a branch with a low confidence to check the alternative groupings that are not included in the QP tree (they are all listed in the outfile!). There should be a significant gap between the lowest reliability value of the QP tree and the most frequent grouping that is not included in the QP tree. For example, if you have a support value of 60% and the not-included grouping occurs with a frequency of 20% then the 60% support for the branch is OK. Percentage of Unresolved QuartetsPUZZLE computes the number and the percentage of completely unresolved maximum likelihood quartets. An unresolved quartet is a quartet where the maximum likelihood values for each of the three possible quartet topologies are so similar that it is not possible to prefer one of them (Strimmer, Goldman, von Haeseler 1996). The percentage of the unresolved quartets among all possible quartets is an indicator of the suitability of the data for phylogenetic analysis. A high percentage usually results in a highly multifurcating quartet puzzling tree. If you have only few unresolved quartets we recommend to invoke option "u" to get a list of all these quartets. In a likelihood mapping analysis the percentage of completely unresolved quartets is shown in the central basin of the triangle diagram. Automatic Parameter EstimationPUZZLE can estimate both the parameters involved in the models of substitution (TN,
HKY) and in the model of rate variation (Gamma distribution, fraction of invariable sites)
without prior knowledge of an overall tree by a number of different strategies based on
maximum likelihood (Strimmer and von Haeseler, submitted). For all estimated parameters a
corresponding standard error (S.E.) is computed. In most cases the results obtained are
very satisfactory. However, if you have good arguments to choose a different set of
parameters than the values obtained by PUZZLE don't hesitate to use them. If sequences are
extremly similar it is very hard for every algorithm to extract information about the
model from the data. Also, be careful if the estimated parameter values are very close to
the internal upper and lower bounds:
Likelihood MappingLikelihood mapping is a method to analyse the support for internal branches in a tree without having to compute an overall tree. Every internal branch in an a completely resolved tree defines up to four clusters of sequences. If one is interested in the relation of these groups a likelihood mapping analysis is adequate. Thus, only prior knowledge of the corresponding clusters is necessary. The likelihood mapping diagrams (as contained in various output files generated by PUZZLE) will then illucidate the possible relationships in detail. More about likelihood mapping will be published elsewhere (Strimmer and von Haeseler 1997).
Limits and Error MessagesPUZZLE has a built-in limit to allow data sets only up to 257 sequences in order to avoid overflow of internal integer variables. At least 32767 sites should be possible depending on the compiler used. Computation time will be the largest constraint even if sufficient computer memory is available. If rate heterogeneity is taken into account every additional category slows down the overall computation by the amount of time needed for one complete run assuming rate homogeneity. If problems are encountered PUZZLE terminates program execution and returns a plain
text error message. Depending on the severity errors are classified into three groups:
A standard machine (1996 UNIX workstation) with 32 to 64 MB RAM PUZZLE can easily do maximum likelihood tree searches including estimation of support values for data sets with 50-100 sequences. More sequences are possible but probably not very useful (star tree!). As likelihood mapping is not memory consuming and computationally quite fast it can be applied to large data sets as well.
Other ProgramsThere are a number of other very useful and widespread programs to reconstruct
phylogenetic relationships and to analyse molecular sequence data that are available free
of charge. Here are the URLS of some web pages that provide links to most of them
(including the PHYLIP, MOLPHY, and PAML maximum likelihood programs):
AcknowledgementsThe maximum likelihood kernel of PUZZLE is an offspring of the program NucML/ProtML version 2.2 by Jun Adachi and Masami Hasegawa (ftp://sunmh.ism.ac.jp/pub/molphy). We thank them for generously allowing us to use the source code of their program. The maze as icon for PUZZLE was suggested by Joe Felsenstein. Hans Zischler reported a problem with the input tree routine of previous versions of PUZZLE and Katja Nieselt-Struwe helped to improve the EPSF code. We thank Michael Schöniger and Matthias Krings for beta testing and José Castresana for making PUZZLE run under VMS. We thank Catherine Letondal and Liz Bailes for helping to correct a problem of PUZZLE 3.0 on DEC Alpha machines. Finally, we would like to thank the European Bioinformatics Institute (EBI) and the Institut Pasteur for kindly distributing the PUZZLE program and the Deutsche Forschungsgemeinschaft (DFG) for financial support.
ReferencesAdachi, J. and M. Hasegawa. 1996. MOLPHY: programs for molecular phylogenetics, version 2.3. Institute of Statistical Mathematics, Tokyo. Adachi, J. and M. Hasegawa. 1996. Model of amino acid substitution in proteins encoded by mitochondrial DNA. J. Mol. Evol. 42: 459-468. Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model of evolutionary change in proteins. In: Dayhoff, M. O. (ed.) Atlas of Protein Sequence Structur, Vol. 5, Suppl. 3. National Biomedical Research Foundation, Washington DC, pp. 345-352. Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 17: 368-76. Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle. Felsenstein, J. and G.A. Churchill. 1996. A hidden Markov model approach to variation among sites in rate of evolution. Mol. Biol. Evol. 13: 93-104. Hasegawa, M., H. Kishino, and K. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22: 160-174. Jukes, T. H. and C. R. Cantor. 1969. Evolution of protein molecules. In: Munro, H. N. (ed.) Mammalian Protein Metabolism, New York: Academic Press, pp. 21-132. Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. CABIOS 8: 275-282. Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16: 111-120. Tamura, K. and M. Nei. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10: 512-526. Tamura K. 1994. Model selection in the estimation of the number of nucleotide substitutions. Mol. Biol. Evol. 11: 154-157. Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22: 4673-4680. Saitou, N. and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4: 1406-425. Schöniger, M. and A. von Haeseler. 1994. A stochastic model for the evolution of autocorrelated DNA sequences. Mol. Phyl. Evol. 3: 240-247. Strimmer, K. and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13: 964-969. Strimmer, K., N. Goldman, and A. von Haeseler. 1997. Bayesian probabilities and quartet puzzling. Mol. Biol. Evol. 14: 210-211. Strimmer, K. and A. von Haeseler. 1997. Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. PNAS (USA). 94: 6815-6819. Strimmer, K. and A. von Haeseler. 1997. Parameter estimation for models of sequence evolution. Genetics. Submitted. Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306-314.
Version HistoryThe PUZZLE program has first been distributed in 1995. Since then it has been
continually improved. Here is a list of the most important changes.
|
Send mail to Michael
Garrick with questions or comments about this web site.
|