Next: , Previous: , Up: Process   [Contents]


3.5.3 Frequencies sets

The following frequencies distributions are available:

Fixed()

All frequencies are fixed to their initial value and are not estimated.

GC(theta={real]0,1[})

For nucleotides only, set the G content equal to the C content.

Full(theta1={real]0,1[}, theta2={real]0,1[}, ..., thetaN={real]0,1[})

Full parametrization. Contains N free parameters, where N is equal to the size of the alphabet - 1. For codon models, N is the size of the alphabet - 1 - the number of stop codons, whose frequencies are set to 0. For nucleotide sequences, theta is the GC content, theta1 is the proportion of A over A+T, and theta2 is the proportion of G over G+C.

Word(frequency={frequency set description})

or

Word(frequency1={frequency set description}, frequency2={frequency set description}, ..., frequencyn={frequency set description})

frequencies on words computed as the product of frequencies on the letters. The arguments frequency and frequency{i} are for descriptions of frequency sets on single sites such as nucleotides or proteins. The alphabet must be a Word alphabet.

If the argument is frequency, the number of multiplied single site frequencies is the length of the words in the alphabet, and the same single site frequency set is used (ie the parameters are shared between all positions).

If the arguments are frequency1, ..., frequency{n}, the length of the words in the alphabet must be n, and all single site frequency sets are independent. In that case, all single site frequency set parameters are position dependent.

alphabet=Word(letter=DNA,length=4)
Word(frequency=GC())

builds a frequency set on 4 bases words, such that all sites frequencies follow the same GC frequency set model. The parameter name is 1234_GC.theta.

alphabet=Word(letter=DNA,length=4)
Word(frequency1=GC(),frequency2=GC(),frequency3=Fixed(),\
                      frequency4=Full())

builds a frequency set on 4 bases words, such first and second sites follow independent GC frequency sets, third site follows a Fixed frequency set, and fourth site follows a Full frequency set. Then the parameters names are 1_GC.theta, 2_GC.theta, 4_Full.theta_1, 4_Full.theta_2, 4_Full.theta_3.

Codon(frequency={frequency set description})

or

Codon(frequency1={frequency set description}, frequency2={frequency set description}, frequency3={frequency set description})

frequencies on codons computed as the product of frequencies on the letters, with stop codon frequencies set to zero. The arguments frequency and frequency{i} are for descriptions of frequency sets on nucleotides. The alphabet must be a Codon alphabet.

If the argument is frequency, the same single site frequency set is used (ie the parameters are shared between all positions).

If the arguments are frequency1, frequency2, frequency3, all single site frequency sets are independent. In that case, all single site frequency set parameters are position dependent.

alphabet=Codon(letter=DNA, type=Standard)
Codon(frequency=GC())

builds a frequency set on codons, such that all sites frequencies follow the same GC frequency set model. The parameter name is 123_GC.theta.

alphabet=Codon(letter=DNA, type=Standard)
Codon(frequency1=GC(),frequency2=GC(),frequency3=Fixed())

builds a frequency set on codons, such that first and second sites follow independent GC frequency sets, third site follows a Fixed frequency set. Then the parameters names are 1_GC.theta, 2_GC.theta.

Predefined codon frequencies are available, with a syntax similar to the one used in the PAML software. See above Codon Models section.

All functions accept the following arguments, that take priority over the parameter specification:

init={balanced,observed}

Set all frequencies to the same value, or to their observed counts.

observedPseudoCount={integer}

If the frequencies are set from observed counts, a pseudoCount is added to all the counts.

values=({vector<double>})

Explicitly set all frequencies manually. The size of the input vector should equal the number of resolved states in the alphabet, be in alphabetical order of states, and sum to one.


Next: , Previous: , Up: Process   [Contents]