|
Character simulations and randomizations
Mesquite can simulate and randomize
characters to build statistical tests. On this page we give an
overview of these features. A more in-depth account of simulation
of DNA sequence evolution is given separately.
Contents
Using results of simulations & randomizations
The simulated or randomized characters can be used or stored
in several ways:
- The characters can be stored into matrices in the current
file by choosing options in the Make New Matrix from
submenu of the Characters menu. For instance, if you choose
Simulated Matrices on Current Tree, the matrix simulated will
be stored in the file.
- The characters may be used directly, at that moment, in calculations.
For example, if you make a Bar & Line Chart for Characters,
and choose Simulated Characters as your source of characters,
the characters will be simulated and used in the chart without
being stored in the file.
- A series of many data files can be saved, each one with a
different replicate of the simulated or randomized data matrix.
This is available through the Save Multiple Matrices
submenu of the Character menu
- A series of many data files can be saved in combination with
scripting files to instruct programs such as Swofford's PAUP
to run the files. This can be done using the Batch Architect,
a description of which is in the page on DNA
simulations and some of the Studies.
To replicate the results of a simulation or randomization, you
can use the Set Seed menu item to set the random
number seed used. If you are using the same conditions, including
the same seed, the simulations and randomizations should be reproducable.
Simulations of character evolution
Stochastic models can be used to simulate character evolution
along the branches of a phylogenetic tree by selecting Simulated
Characters (to generate characters one at a time) or Simulated
Matrices on Current Tree (to generate whole matrices, on a current
tree in a Tree Window), or Simulated Matrices on Trees (to generate
whole matrices, each one on a different tree from a source of
trees). These options are available whenever characters or matrices
might be called for, for instance when making a chart of characters
or matrices.
The following are the character types and models that can be
simulated:
- Evolve Categorical characters. The following
models are also discussed in the section on likelihood
reconstructions.
- Mk1 model — Single parameter model
analogous to Jukes-Cantor. Rates of change equal for all
types of state-to-state changes.
- AsymmMk model — Two parameter asymmetrical
model with differing rates of forward and backward changes.
Forward changes are those in which state number increases
(e.g., state 0 to state 1); backward changes are those in
which state decreases (e.g., state 1 to state 0). One can
specify the forward and backward rates directly, or alternatively,
one can specify an overall rate of change in combination
with a bias of forward versus backward. This model will
generally be appropriate only for binary characters. You
can also specify whether the states are the root are assumed
to be at the equilibrium frequencies implied by the model.
If so, then Mesquite chooses an ancestral state in the simulations
according to the equilibrium frequencies implied by the
bias in gains versus losses. Otherwise, Mesquite chooses
a state with equal probabilities.
- Evolve DNA characters
- Evolve Continous characters
- Brownian motion model — Model with
a single parameter, the rate of change.
To use these simulations, the appropriate character model must
be defined in advance, with all of its parameters specified. Two
models come built-in: a Jukes-Cantor model for DNA simulations,
and a Brownian motion model for continuous variable simulations.
If you want any other models, created them using New Character
Model in the Characters menu.
Viewing results of simulations
Simulated characters can be used in many calculations, but if
you want to visualize directly the results of a simulations you
can use the Trace Character History feature available
in the Analysis menu of the Tree Window. By default Trace Character
History shows a reconstruction of ancestral states. Thus, if the
character is simulated, the states at nodes shown would not be
the "true" ancestral states that occurred during the
simulation, but rather states inferred from the states given to
the terminal taxa by the simulation. However, once Trace Character
History is active, its Trace menu has a Character History
Source menu item. Choose Simulate Ancestral States
and specify the simulation. The states indicated at the nodes
will then be the true ancestral states in the simulation. You
can set the Seed to make the simulation equivalent to simulations
done in other contexts.
Randomizing characters
Existing characters can be randomized as follows:
- Reshuffle Character — Supplies replicate
reshufflings of a single chosen character. In each reshuffling,
the character states are randomly scrambled among taxa, keeping
the frequencies of different character states fixed.
- Reshuffle States within Characters—
Supplies matrices, each of which is a reshuffling of an existing
matrix. The first character of the matrix is a reshuffling of
the first character of the original matrix; the second character
is a reshuffling of the second original character; and so on.
You can think of this as reshuffling within each column of the
matrix.
- Reshuffle Within Characters (Taxa Partitioned)
— As for Reshuffle States within Characters, but respecting
taxa partitions. Each character of the matrix is a reshuffling
of the respective character of the original matrix, but done
only within groups of the current taxa partition. Thus, if taxa
have a current partition that
divides the taxa into Group A and Group B, then the reshuffling
within a character first shuffles all states of the character
among Group A taxa. Then, it shuffles states of the character
within Group B taxa.
- Reshuffle States Within Taxa — Supplies
matrices, each of which is a reshuffling of an existing matrix.
Instead of reshuffling within each column, this reshuffles within
each row. Thus, the states of each taxon are reshuffled among
characters. This might be used, for instance, to generate DNA
data with no phylogenetic signal that preserves the base composition
of each taxon.
- Reshuffle Within Taxa (Char. Partitioned)
— As for Reshuffle States within Taxa, but respecting
character paritions. Thus, if the characters have a current
partition that divides them into (for example) groups 28S and
COI, then the reshuffling within a taxon first shuffles all
states of a taxon among 28S sites, then shuffles all states
of the taxon among COI sites.
- Bootstrap resample — Supplies matrices,
each of which is a bootstrap resampled version of an existing
matrix. Characters are sampled randomly from the original matrix
and moved into the resampled matrix until it contains as many
characters as were in the original matrix. Some of the original
characters may by chance be sampled more than once; some may
be not sampled at all.
- Rarefy characters — Supplies matrices,
each of which is derived from an existing matrix by randomly
deleting entire characters.
- Sprinkle missing — Supplies matrices,
each of which is derived from an existing matrix by randomly
assigning "missing" (? or unassigned) to cells of
the matrix with a particular probability.
- Add noise (for continuous matrices only)
— Available in the Character Matrix Editor under Alter/Transform,
this adds noise to the states of all or selected cells of the
matrix.
- Random Fill — Available in the Alter/Transform
of the Matrix menu of the Character Matrix Editor, it can be
used to fill all or selected cells of the matrix with randomly
chosen states.
|
|