Data Recoding To add computation codonW converts sequence information automatically from it original text format into a numerical format. This is normally transparent to the user. To add additional genetic codes or a personal choice of codon values for calculating the Fop, CAI or CBI indices, some understanding of the schema used to convert the sequences to numerical strings is advisable. When calculating the indices Fop, CBI, or CAI which are measure of codon bias in relation to the codon usage of a set of optimal genes, there is an option of using a personal choice of these values. These are read from file, there must be one value for each codon (64 in total) and they must be found in the file in a set sequence (i.e. the numerical order of the codons, TTT, TCT ... GAG, GGG). This is also the order in which codon and amino acid results are recorded to file. Internally CodonW recodes all nucleotides, codons and amino acids. Nucleotides are recoded as T/U=1, C=2, A=3, G=4. The 20 standard amino acids and the termination codons are recoded as integer values in the range 1 to 21, note that stop codons is assigned the amino acid value 11 (see Table 2). The decision about whether a codon is synonymous, or how many members are in a particular amino acid synonymous family are taken at run time and are dependent on the genetic code chosen. Each codon is recoded into an integer value in the range 1 to 64, see Table 1. The formulae used to recode the codons is: Equation 1 code=((p1-1)*16)+P2+((p3-1)*4) 1<= code <= 64 Where each of the three codon positions is represented by P1, P2 and P3. Using this recoding convention, the codon ATG has the value 45. code=((3-1)*16)+1+((4-1)*4)=45 Unrecognised or non-translatable bases, codons or amino acids are represented all assigned the value zero. Table 1 Numerical values used for recoding codons Code Codon AA Code Codon AA Code Codon AA Code Codon AA 1 UUU Phe 2 UCU Ser 3 UAU Tyr 4 UGU Cys 5 UUC 6 UCC 7 UAC 8 UGC 9 UUA Leu 10 UCA 11 UAA STOP 12 UGA STOP 13 UUG 14 UCG 15 UAG 16 UGG Trp 17 CUU 18 CCU Pro 19 CAU His 20 CGU Arg 21 CUC 22 CCC 23 CAC 24 CGC 25 CUA 26 CCA 27 CAA Gln 28 CGA 29 CUG 30 CCG 31 CAG 32 CGG 33 AUU Ile 34 ACU Thr 35 AAU Asn 36 AGU Ser 37 AUC 38 ACC 39 AAC 40 AGC 41 AUA 42 ACA 43 AAA Lys 44 AGA Arg 45 AUG Met 46 ACG 47 AAG 48 AGG 49 GUU Val 50 GCU Ala 51 GAU Asp 52 GGU Gly 53 GUC 54 GCC 55 GAC 56 GGC 57 GUA 58 GCA 59 GAA Glu 60 GGA 61 GUG 62 GCG 63 GAG 64 GGG Table 2 Numerical values used to recode amino acids. Code AA One letter code Code AA One letter code 1 Phe F 2 Leu L 3 Ile I 4 Met M 5 Val V 6 Ser S 7 Pro P 8 Thr T 9 Ala A 10 Tyr Y 11 Stop * 12 His H 13 Gln Q 14 Asn N 15 Lys K 16 Asp D 17 Glu E 18 Cys C 19 Trp W 20 Arg R 21 Gly G