phold
is sensititve annotation tool for bacteriophage
genomes and metagenomes using protein strucutal homology.
phold
uses the ProstT5 protein
language model to translate protein amino acid sequences to the 3Di
token alphabet used by foldseek. Foldseek
is then used to search these against a database of 803k protein
structures mostly predicted using Colabfold.
Alternatively, you can specify protein structures that you have pre-computed for your phage(s) instead of using ProstT5.
The phold
databse consists of approximately 803k protein
structures generated using Colabfold from the
following databases:
If you don’t want to install phold
locally, you can run
it without any code using one of the following Google Colab
notebooks:
pharokka
+ phold
(recommended) https://colab.research.google.com/github/gbouras13/phold/blob/main/run_pharokka_and_phold.ipynbphold
https://colab.research.google.com/github/gbouras13/phold/blob/main/run_phold.ipynb