pharokkapharokka is a fast phage annotation pipeline.
If you like pharokka, you will probably love phold. phold
uses structural homology to improve phage annotation. Benchmarking is
ongoing but phold strongly outperforms
pharokka in terms of annotation, particularly for less
characterised phages such as those from metagenomic datasets.
pharokka still has features phold lacks for
now (identifying tRNA, tmRNA, CRISPR repeats, and INPHARED taxonomy
search), so it it recommended to run phold after running
pharokka.
phold takes the Genbank output of Pharokka as input.
Therefore, if you have already annotated your phage(s) with Pharokka,
you can easily update the annotation with more functional predictions
with phold.
If you don’t want to install pharokka or
phold locally, you can run pharokka and
phold, or only pharokka, without any code
using the Google Colab notebook https://colab.research.google.com/github/gbouras13/pharokka/blob/master/run_pharokka_and_phold.ipynb
pharokka uses PHANOTATE, the only
gene prediction program tailored to bacteriophages, as the default
program for gene prediction. Prodigal implemented with
pyrodigal and Prodigal-gv
implemented with pyrodigal-gv are
also available as alternatives. Following this, functional annotations
are assigned by matching each predicted coding sequence (CDS) to the PHROGs, CARD and VFDB databases using MMseqs2. As of v1.4.0,
pharokka will also match each CDS to the PHROGs database
using more sensitive Hidden Markov Models using PyHMMER. Pharokka’s main
output is a GFF file suitable for using in downstream pangenomic
pipelines like Roary.
pharokka also generates a cds_functions.tsv
file, which includes counts of CDSs, tRNAs, tmRNAs, CRISPRs and
functions assigned to CDSs according to the PHROGs database. See the
full usage and check out the full documentation for more
details.
For more information, please read the pharokka
manuscript:
George Bouras, Roshan Nepal, Ghais Houtak, Alkis James Psaltis, Peter-John Wormald, Sarah Vreugde, Pharokka: a fast scalable bacteriophage annotation tool, Bioinformatics, Volume 39, Issue 1, January 2023, btac776, https://doi.org/10.1093/bioinformatics/btac776