bppml (BppSuite Manual 2.2.0)

4.1 BppML: Bio++ Maximum Likelihood

The BppML program uses the common syntax introduced in the previous section for setting the alphabet, loading the sequences (see Sequences), specifying the model (see Model), and estimating parameters (see Estimation).

The BppML program allows you to optimize tree topologies and model parameters and perform a bootstrap analysis.

4.1.1 Branch lengths initial values

init.tree = {user|random}

Set the method for the initial tree to use. The user option allows you to use an existing file using the method described in the Common options section. This file may have been built using another method like neighbor joining or parsimony for instance. The random option picks a random tree, which is handy to test convergence. This may however slows down significantly the optimization process.

init.brlen.method = {method description}

Set how to initialize the branch lengths. Available methods include:

Input(midpoint_root_branch={boolean}): Keep initial branch lengths as is. Additional argument specifies if the root position should be moved to the midpoint position of the branch containing it.
Equal(value={float>0}): Set all branch lengths to the same value, provided as argumemt.
Clock: Coerce to a clock tree.
Grafen(height={{real>0}|input}, rho = {real>0}): Uses Grafen’s method to compute branch lengths. In Grafen’s method, each node is given a weight equal to the number of underlying leaves. The length of each branch is then computed as the difference of the weights of the connected nodes, and further divided by the number of leaves in the tree. The height of all nodes are then raised to the power of ’rho’, a user specified value. The tree is finally scaled to match a given total height, which can be the original one (height=input), or fixed to a certain value (usually height=1). A value of rho=0 provides a star tree, and the greater the value of rho, the more recent the inner nodes.

input.tree.check_root = {boolean}

Tell if the input tree should be checked regarding to the presence of a root. If set to yes (the default), rooted trees will be unrooted if a homogenous model is used. If not, a rooted tree will be fitted, which can lead to optimization issues in most cases. Use the non default option with care!

4.1.2 Topology optimization

optimization.topology = {boolean}: Enable the tree topology estimation.
optimization.topology.algorithm = {NNI}: Algorithm to use for topology estimation: only NNI available for now.
optimization.topology.algorithm_nni.method = {fast|better|phyml}: Set the NNI method to use. fast: test sequentially all NNI, if a NNI improving the likelihood is found, it is performed. better: test all possible NNIs, do the one with the biggest likelihood increase. phyml: test all possible NNIs, try doing all the improving ones. If the final likelihoods is better, perform all NNIs. Otherwise, try to do half of them, and so on. In most cases the phyml option shows the best performance.
optimization.topology.nstep = {int>0}: Number of phyml topology movement steps before re-optimizing parameters.
optimization.topology.numfirst = {boolean}: Shall we estimate parameters before looking for topology movements?
optimization.topology.tolerance.before = {real>0}: Tolerance for the prior-topology estimation. The tolerance numbers should not be too low, in order to save computation time and also for a better topology estimation. The optimization.tolerance parameter will be used for the final optimization of numerical parameters (see Common options).
optimization.topology.tolerance.during = 100: Tolerance for the during-topology estimation
optimization.scale_first = no: Shall we first scale the tree before optimizing parameters?
optimization.scale_first.tolerance = {double}: The convergence criterion to achieve in the optimization.

4.1.3 Molecular clock

BppML can also optimize branch lengths with a molecular clock:

optimization.clock={no|global}: Tell if a molecular clock should be assumed. Topology estimation is not possible with a clock constraint.

4.1.4 Output results

output.infos = {{path}|none}: Alignment information log file (site specific rates, etc):
output.estimates = {{path}|none}: Write numerical parameter estimated values.

4.1.5 Bootstrap analysis

bootstrap.number = {int>0}: Number of replicates. A reasonable value would be >= 100.
bootstrap.approximate = {boolean}: Tell if numerical parameters should be kept to their initial value when bootstrapping.
bootstrap.verbose = {boolean}: Set this to yes for detailed output when bootstrapping.
bootstrap.output.file = {{path}|none}: Where to write the resulting trees (multi-trees newick format).

4.1.6 Rather technical options

Theses options are mainly for debugging or testing purpose, in most case you will be happy with the default setting.

likelihood.recursion = {simple|double}

Set the type of likelihood recursion to use. simple: derivatives take more time to compute, but likelihood computation is faster. For big data sets, it can save a lot of memory usage too, particularly when the data are compressed. double: uses more memory and need more time to compute likelihood, due to the double recursion. Analytical derivatives are however faster to compute.

This command has no effect in the following cases: (i) topology estimation: this requires a double recursive algorithm, (ii) optimization with a molecular clock: a simple recursion with data compression is used in this case, due to the impossibility of computing analytical derivatives.

likelihood.recursion_simple.compression = {simple|recursive}

Site compression for the simple recursion: simple: identical sites are not computed twice, recursive: look for site patterns to save computation time during optimization, but requires extra time for building the patterns. This is usually the best option, particularly for nucleotide data sets.