Next: , Previous: , Up: Process   [Contents]


3.5.2 Setting up non-stationary / non-homogeneous models

You can specify a wide range of non-homogeneous models, by combining different options.

3.5.2.1 One-per-branch non-homogeneous models

This option share the same parameters as the homogeneous case, since the same kind of model is used for each branch. The additional options are the following:

nonhomogeneous_one_per_branch.shared_parameters = {list<chars>}

List the names of the parameters that are shared by all branches. In Galtier & Gouy model, that would be T92.kappa, since only the theta parameter is branch-specific.

The ’*’ wildcard can be used, as in *theta* for all the parameters whose name has theta in it.

3.5.2.2 General non-homogeneous models

Bio++ provides a general syntax to specify almost any non-homogeneous model.

nonhomogeneous.number_of_models = {int>0}

Set the number of distinct models to use.

You now have to configure each model individually, using the syntax introduced for the homogeneous case, excepted that model will be numbered, for instance:

model1 = T92(theta=0.39, kappa=2.79)

The additional option is available to attach the model to branches in the tree, specified by the id of the upper node in the tree:

model1.nodes_id = 1,5,10:15,19

Specify the ids of the nodes to which the node is attached. Id ranges can be specified using the begin:end syntax.

Finally, you may find useful the following options:

output.parameter_names.file = {{path}|none}

A text file listing all parameter names. This might come handy in order to specify the parameter that should not be optimized (see optimization.ignore_parameter) or aliased (see above). The use of that option will cause the program to exit just after producing the list file.

3.5.2.3 Paths among non-homogeneous mixture models

To define constraints for sites between submodels, we can set "paths" that any site must follow. For example, in the following description:

nonhomogeneous = general
nonhomogeneous.number_of_models = 3

model1=T92()
model2=MixedModel(model=T92(kappa=Simple(values=(4,10,20),probas=(0.1,0.5,0.4))))
model3=MixedModel(model=TN93(theta1=Simple(values=(0.1,0.5,0.9),probas=(0.3,0.2,0.5))))

model1.nodes_id=0:1
model2.nodes_id=2:3
model3.nodes_id=4:5

In this case, on branches 2 & 3 a site follows any submodel of model 2 (but the same submodel on both branches), and on branches 4 & 5, a site follows any submodel of model 3 (the same on both branches as well). But there is no constraint between models 2 & 3, which means that a site can follow any submodel of model 2 and any submodel of model 3.

If the user wants that a site with T92.kappa=4 in model 2 has TN93.theta1=0.1 in model 3, that a site with T92.kappa=10 in model 2 has TN93.theta1=0.9 in model 3, and that other cases are free (in this case it means that T92.kappa=20 in model 2 is linked with TN93.theta1=0.5 in model 3), then we can use the declarations:

site.number_of_paths=2
site.path1=model2[T92.kappa_1] & model3[TN93.theta1_2]
site.path2=model2[T92.kappa_2] & model3[TN93.theta1_3]

The third path (for the remaining submodels) is automatically computed.

It is possible to link mixtures of submodels. For example,

site.path1=model2[T92.kappa_1] & model3[TN93.theta1_2] & model3[TN93.theta1_3]

means that a site that has T92.kappa=4 in model2 has either TN93.theta1=0.5 or TN93.theta1=0.9 in model3.

Because of these constraints, the probabilities of the submodels are linked. In the first example, probability of T92.kappa=4 in model 2 equals the probability of TN93.theta1=0.5 in model 3. Since it is contradictory with the probabilities defined in models 2 or 3, the reference probabilities are the ones of the first numbered mixed model, here model 2. In this case, the probabilities in model 3 may have no use, but with the second example the probability of submodel T92.kappa=4 equals the sum of the probabilities of submodels TN93.theta1=0.5 or TN93.theta1=0.9. The relative proportion of those models used in the declaration of model 3 is then used. Here their respective probabilities are then: 0.1*0.2/ (0.2+0.5)=0.0286 and 0.1*0.5/(0.2+0.5)=0.0714.

Concerning the optimization procedure, this choice may entail the non- identifiability of several parameters (here the probabilities in model 3), so the user should be careful about this.

Another example in the case of mixtures of mixed models, where the submodels are defined by their names;

nonhomogeneous = general
nonhomogeneous.number_of_models = 2

model1=LLG08_UL2()
model2=LLG08_UL3()

site.number_of_paths=2
site.path1=model1[LLG08_UL2.M2] & model2[LLG08_UL3.Q1]
site.path2=model1[LLG08_UL2.M1] & model2[LLG08_UL3.Q2] & model2[LLG08_UL3.Q3]

When nonhomogeneity option is one_per_branch, each site is constrained to follow the same submodel from leaves to root.

3.5.2.4 Root frequencies

In case of nonstationary models, the ancestral frequencies are distinct parameters. If a model is assumed to be stationary, the “None” parameter value can be used, which is strictly equivalent to setting nonhomogeneous.stationary=yes.

When the model is a mixture model, since there is not a set of equilibrium frequencies, with this option the root frequencies are set to be the average (with the respective probabilities of the submodels) of the equilibrium frequencies of the submodels.

As since version 0.4.0, BppSuite uses the keyval syntax to set up root frequencies,

nonhomogeneous.root_freq={frequency set description}

The Frequencies set used can be any of the ones described below See Frequencies sets, depending on the alphabet used.


Next: , Previous: , Up: Process   [Contents]