Index of /docs/C2A.A2C/1.4

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory   -  

C2A / A2C

C2A and A2C are command line programs written in Java to translate and back-translate FASTA-formatted codon and amino-acid sequence files, respectively. These tools were implemented to easily infer multiple sequence alignments at the codon level.


Run C2A without option to read the following documentation:


  USAGE:   C2A  <seq.fna>

  where <seq.fna> is a FASTA-formatted codon sequence file. This will
  output in  stdout the  translation (standard  genetic code) of each
  sequence in the same format.

Run A2C without option to read the following documentation:


  USAGE:   A2C  <ali.faa>  <seq.fna>

  where <ali.faa> is  a FASTA-formatted  multiple amino acid sequence
  alignment file  and <seq.ali> a FASTA-formatted file containing the
  associated codon sequences. This will output in stdout the multiple
  back-translated sequence alignment.


To illustrate the usefulness of C2A and A2C, the directory example contains FASTA files from the study of Drini et al. (2016). The first file seq.fna contains several Leishmania and Trypanosoma codon sequences from the sub-family HSPA1. In order to easily infer an accurate multiple sequence alignment at the codon level, C2A and A2C could be used together with a standard multiple sequence alignment program.

First, using C2A allows creating the file seq.faa that contains the translation of every codon sequence inside seq.fna:

C2A  seq.fna  >  seq.faa

Second, the created seq.faa could be used to infer a multiple amino-acid sequence alignment, which is expected to be more accurate than the one inferred from the initial codon sequences. The directory example contains such an alignment inside the file ali.faa.

Finally, using A2C allows creating the file ali.fna by back-translating the amino-acid sequences inside ali.faa with the associated codon sequences inside seq.fna:

A2C  ali.faa  seq.fna  >  ali.fna

Following this way, the file ali.fna contains an accurate multiple sequence alignment at the codon level, i.e. the homology is recovered for each codon position.


Drini S, Criscuolo A, Lechat P, Imamura H, Skalický T, Rachidi N, Lukeš J, Dujardin JC, Späth GF (2016) Species- and strain-specific adaptation of the HSP70 super family in pathogenic Trypanosomatids. Genome Biology and Evolution, 8(6):1980-1995. doi:10.1093/gbe/evw140.