Codon (BppSuite Manual 2.2.0)

3.5.1.4 Codon models

Standard codon models: the global genetic_code argument describes the genetic code and has to be specified.

Codon models also take as argument a frequencies option specifying the equilibrium frequencies of the model. Any frequencies description can be used here, but the syntax also supports options similar to the ones used in the PAML software:

F0: all frequencies are assumed to be fixed and equal to 1/61, 0 for stop codons.
F1X4: 4 distinct frequencies are used, with parameters theta, theta1, theta2 (See Frequencies sets, “Full” method).
F3X4: 4 distinct frequencies are used for each position, resulting in 9 parameters in total (3 independent “Full” frequencies set).
F61: free equilibrium frequencies, stop codons set to 0.

An optional option mgmtStopCodon can be set to define how the frequencies computed to stop codons in the case of F1X4 et F3X4 are distributed to other codons.

uniform : each stop frequency is distributed evenly
linear : each stop frequency is distributed to the neighbour codons (ie 1 substitution away), in proportion to each target codon frequency.
quadratic (default): each stop frequency is distributed to the neighbour codons (ie 1 substitution away), in proportion to the square of each target codon frequency.

The same words can be used to specify root frequencies for codon models, in the case of non stationarity.

GY94([genetic_code={genetic code description}, kappa={real>0}, V={real>0}, "equilibrium frequencies"]): Goldman and Yang (1994) substitution model for codons (default values: kappa=1 and V=10000). See the Bio++ description.
MG94([genetic_code={genetic code descrition}, rho={real>0}, "equilibrium frequencies"]): Muse and Gaut (1994) substitution model for codons (default values: rho=1). See the Bio++ description.
YN98([genetic_code={genetic code description}, kappa={real>0}, omega={real>0}, "equilibrium frequencies"]): Yang and Nielsen (1998) substitution model for codons (default values: kappa=1 and omega=1). See the Bio++ description.
YNGKP_M0([genetic_code={genetic code description}, kappa={real>0}, omega={real>0}, "equilibrium frequencies"]): The M0 model of PAML, ie the same as YN98. See the Bio++ description.
YNGKP_M1([genetic_code={genetic code description},kappa={real>0}, omega={real>0}, p0={real>0 and <1 }, "equilibrium frequencies"]): The M1a model of PAML, see Yang, Z., R. Nielsen, N. Goldman, and A.-M. K. Pedersen (2000) (default values: kappa=1, p0=0.5, omega=0.5). See the Bio++ description.
YNGKP_M2([genetic_code={genetic code description},kappa={real>0}, omega0={real>0 and <1}, theta1={real>0 and <1 }], omega1={real>1}, theta2={real>0 and <1 }, "equilibrium frequencies"]): The M2a model of PAML, see Yang, Z., R. Nielsen, N. Goldman, and A.-M. K. Pedersen (2000), with p0=theta1 and p1=(1-theta1)*theta2 (default values: kappa=1, theta1=0.33333, theta2=0.5, omega0=0.5, omega2=0.5). See the Bio++ description.
YNGKP_M3([genetic_code={genetic code description}, n={integer>0}, kappa={real>0}, omega0={real>0 and <1}, delta1={real>0}, ..., deltan-1={real>0}, theta1={real>0 and <1 }, ..., thetan-11={real>0 and <1 }, "equilibrium frequencies"]): The M3 model of PAML, see Yang, Z., R. Nielsen, N. Goldman, and A.-M. K. Pedersen (2000), with n discrete values, with p0=theta1 and pk=(1-theta1)*...*(1-thetak)*theta(k+1), and omegak=omega0+delta1+....+deltak (default values: n=3, kappa=1, thetak=1/(n-k+1), omega0=0.5, deltak=0.5). See the Bio++ description.
YNGKP_M7(n={integer>0}, genetic_code={genetic code description},kappa={real>0}, p={real>1}, q={real>1 }, "equilibrium frequencies"]): The M7 model of PAML, see Yang, Z., R. Nielsen, N. Goldman, and A.-M. K. Pedersen (2000), with the Beta distribution discretized in n classes (default values: kappa=1, p=2, q=2). See the Bio++ description.
YNGKP_M8(n={integer>0}, [genetic_code={genetic code description},kappa={real>0}, omegas={real>1}, p0={real>0},p={real>1}, q={real>1 }, "equilibrium frequencies"]): The M8 model of PAML, see Yang, Z., R. Nielsen, N. Goldman, and A.-M. K. Pedersen (2000), with the Beta distribution discretized in n classes (default values: kappa=1, p=2, q=2, p0=0.5, omegas=2). See the Bio++ description.

It is also possible to setup more specific models, by specifying a nucleotide model for each position. Model parameters names then take the form of <codon model name>.<position set>_<position model name>.<position specific parameter name>.

In the following models, the arguments model and model{i} are for descriptions of models on bases.

If the argument is model, the same single site model is used on all positions (ie the parameters are shared between all positions).
If the arguments are model1, model2, model3, each single site model stands for a single-site substitution model. In that case, all single site models parameters are position dependent.

Each single site model is normalized and the substitution rates between codons that differ on more than one letter are null.

The generator is first computed with these models and parameters on the whole triplet alphabet, and then the substitution rates to and from stop codons are set to zero and the generator is normalized with this modification.

CodonRate(model={model name} [, relrate1={real>0}, relrate2={real>0}, "equilibrium frequencies"])

CodonRate(model1={model name}, model2={model name}, model3={model name}[, relrate1={real>0}, relrate2={real>0}, "equilibrium frequencies"])

Substitution model on codons with position specific evolution rates.

Arguments relrate{i} stands for the relative substitution rates of the sites. Default: relrate{i}=1/{4-i}, such that the rate of each site is 1/3.

alphabet=Codon(letter=DNA, type=Standard)
model=CodonRate(model=T92)

builds a model on codons, such all sites follow the same T92 model. The parameters names are CodonRate.123_T92.kappa, CodonRate.relrate1, CodonRate.relrate2.

alphabet=Codon(letter=DNA, type=Standard)
model=CodonRate(model1=T92, model2=T92, model3=JC69)

builds a model on codons, such that first and second sites follow independent T92 models, and third site follows a JC69 model. Then the parameters names are CodonRate.1_T92.kappa, CodonRate.2_T92.kappa, CodonRate.relrate1, CodonRate.relrate2, and can be initialized as is:

model=CodonRate(model1=T92(theta=0.5, kappa=2), \
                model2=T92(theta=0.4, kappa=2), model3=JC69)

See the Bio++ description.

CodonDist(model={model name}[, genetic_code={genetic code description}, beta={real>0}, "equilibrium frequencies"])

CodonDist(model1={model name}, model2={model name}, model3={model name}[, geneticcode={genetic code description}, beta={real>0}, "equilibrium frequencies"])

Substitution model on codons that takes into account the difference between synonymous and non-synonymous substitutions.

Optional argument beta is the ratio between non-synonymous substitution rate and synonymous substitution rate. Default value: 1.

alphabet=Codon(letter=DNA, type=Standard)
model=CodonDist(model=T92)

builds a model on codons, such all sites follow the same T92 model. The parameters names are CodonDist.123_T92.kappa and CodonDist.beta.

alphabet=Codon(letter=DNA, type=Standard)
model=CodonDist(model1=T92, model2=T92, model3=JC69)

builds a model on codons, such that first and second sites follow independent T92 models, and third site follows a JC69 model. Then the parameters names are CodonDist.1_T92.kappa, CodonDist.2_T92.kappa, CodonDist.beta.

See the Bio++ description.

CodonRateFreq(model={model name}, frequencies={frequencies set description}[, relrate1={real>0}, relrate2={real>0}, "equilibrium frequencies"])

CodonRateFreq(model1={model name}, model2={model name}, model3={model name}, frequencies={frequencies set description} [, relrate1={real>0}, relrate2={real>0}, "equilibrium frequencies"])

Substitution model on codons with position specific evolution rates, where the sustitution rates are multiplied by the frequency of the target codon in the given frequencies set.

This model should be used with nucleotidic models which equilibrium distribution is fixed, ans does not depend on the parameters. Otherwise there may be problems of identifiability of the parameters.

The multiplicative distribution of the model is described by the frequencies argument. See the description of the Frequencies Set below.

Each single site model is normalized and the substitution rates between codons that differ on more than one letter are null.

Arguments relrate{i} stands for the relative substitution rates of the sites. Default: relrate{i}=1/{4-i}, such that the rate of each site is 1/3.

alphabet=Codon(letter=DNA, type=Standard)
model=CodonRateFreq(frequencies=Full())

has parameters CodonRateFreq.123_K80.kappa, CodonRateFreq.Full.theta_1, ..., CodonRateFreq.Full.theta_60, CodonRateFreq.relrate1, CodonRateFreq.relrate2.

See the Bio++ description.

CodonDistFreq(model={model name}, frequencies={frequencies set description} [geneticcode={genetic code description}, beta={real>0}, "equilibrium frequencies"])

CodonDistFreq(model1={model name}, model2={model name}, model3={model name}, frequencies={frequencies set description} [geneticcode={genetic code description}, beta={real>0}, "equilibrium frequencies"])

Substitution model on codons that takes into account the difference between synonymous and non-synonymous substitutions. Moreover, the sustitution rates are multiplied by the frequency of the target codon in the given frequencies set.

This model should be used with nucleotidic models which equilibrium distribution is fixed, ans does not depend on the parameters. Otherwise there may be problems of identifiability of the parameters.

The multiplicative distribution of the model is described by the frequencies argument. See the description of the Frequencies Set below.

Optional argument beta is the ratio between non-synonymous substitution rate and synonymous substitution rate. Default value: 1.

alphabet=Codon(letter=DNA, type=Standard)
model=CodonDistFreq(frequencies=Full())

has parameters CodonDistFreq.012_T92.kappa, CodonDistFreq.Full.theta_1, ..., CodonDistFreq.Full.theta_60, CodonDistFreq.beta.

See the Bio++ description.

CodonDistPhasFreq(model={model name}, frequencies={frequencies set description} [, geneticcode={genetic code description}, beta={real>0}])

CodonDistPhasFreq(model1={model name}, model2={model name}, model3={model name}, frequencies={frequencies set description} [, geneticcode={genetic code description}, beta={real>0}])

Substitution model on codons that takes into account the difference between synonymous and non-synonymous substitutions. Moreover, the sustitution rates are multiplied by the product of the frequencies of the changed nucleotides – conditioned on the phase – in the given frequencies set.

This model should be used with nucleotidic models in which equilibrium distribution is fixed, ans does not depend on the parameters. Otherwise there may be problems of identifiability of the parameters.

The multiplicative distribution of the model is described by the frequencies argument. See the description of the Frequencies Set below.

Optional argument beta is the ratio between non-synonymous substitution rate and synonymous substitution rate. Default value: 1.

See the Bio++ description.

CodonDistFitPhasFreq(model={model name}, frequencies={frequencies set description}, fitness={frequencies set description} [, geneticcode={genetic code description}, beta={real>0}])

CodonDistFitPhasFreq(model1={model name}, model2={model name}, model3={model name}, frequencies={frequencies set description}, fitness={frequencies set description} [, geneticcode={genetic code description}, beta={real>0}])

Substitution model on codons that takes into account the difference between synonymous and non-synonymous substitutions and the difference between synonymous codons, in the same manner as in Yang and Nielsen’s 2008 substitution model. The sustitution rates are multiplied by the product of the frequencies of the changed nucleotides – conditioned on the phase – in the given frequencies set, and by ratios of fitnesses of the codons.

The multiplicative distribution of the model is described by the frequencies and fitness arguments. See the description of the Frequencies Set below.

Optional argument beta is the ratio between non-synonymous substitution rate and synonymous substitution rate. Default value: 1.

See the Bio++ description.