TL;DR Platon detects plasmid contigs within bacterial draft genomes from WGS short-read assemblies. Therefore, Platon analyzes the natural distribution biases of certain protein coding genes between chromosomes and plasmids. This analysis is complemented by comprehensive contig characterizations upon which several heuristics are applied.

Input/Output

Input

Platon accepts draft assemblies in fasta format. If contigs have been assembled with SPAdes, Platon is able to extract the coverage information from the contig names.

Output


Fig: Replicon distribution and alignment hit frequencies of MPS. Shown are summed plasmid and chromosome alignment hit frequencies per MPS plotted against plasmid/chromosome hit count ratios scaled to [-1 (chromosome), 1 (plasmid)]; Hue: normalized RDS values (min=-100, max=100), hit count outliers below 10-4 and above 1 are discarded for the sake of readability.

For each contig classified as plasmid sequence the following columns are printed to STDOUT as tab separated values:

Installation

Platon can be installed via BioConda and Pip. However, we encourage to use Conda to automatically install all required 3rd party dependencies. In all cases a mandatory database must be downloaded.

BioConda

$ conda install -c conda-forge -c bioconda -c defaults platon

Pip

$ python3 -m pip install --user platon

Platon requires the following 3rd party executables which must be installed & executable:

Database download

Platon requires a mandatory database which is publicly hosted at Zenodo:

Further information is provided in the database section below.

$ wget https://zenodo.org/record/4066768/files/db.tar.gz
$ tar -xzf db.tar.gz
$ rm db.tar.gz

The db path can either be provided via parameter (--db) or environment variable (PLATON_DB):

$ platon --db <db-path> genome.fasta

$ export PLATON_DB=<db-path>
$ platon genome.fasta

Additionally, for a system-wide setup, the database can be copied to the Platon base directory:

$ cp -r db/ <platon-installation-dir>

Usage

usage: platon [-h] [--db DB] [--mode {sensitivity,accuracy,specificity}]
              [--characterize] [--output OUTPUT] [--prefix PREFIX]
              [--threads THREADS] [--verbose] [--version]
              <genome>

Identification and characterization of bacterial plasmid contigs from short-read draft assemblies.

positional arguments:
  <genome>              draft genome in fasta format

optional arguments:
  -h, --help            show this help message and exit
  --db DB, -d DB        database path (default = <platon_path>/db)
  --mode {sensitivity,accuracy,specificity}, -m {sensitivity,accuracy,specificity}
                        applied filter mode: sensitivity: RDS only (>= 95%
                        sensitivity); specificity: RDS only (>=99.9%
                        specificity); accuracy: RDS & characterization
                        heuristics (highest accuracy) (default = accuracy)
  --characterize, -c    deactivate filters; characterize all contigs
  --output OUTPUT, -o OUTPUT
                        output directory (default = current working directory)
  --prefix PREFIX, -p PREFIX
                        file prefix (default = input file name)
  --threads THREADS, -t THREADS
                        number of threads to use (default = number of
                        available CPUs)
  --verbose, -v         print verbose information
  --version, -V         show program's version number and exit

Examples

$ platon genome.fasta

Expert: writing results to results directory with verbose output using 8 threads:

$ platon -db ~/db --output results/ --verbose --threads 8 genome.fasta

Mode

Platon provides 3 different modi controlling which filters will be used. Accuracy mode is the preset default.

Sensitivity

In the sensitivity mode Platon will classifiy all contigs with an RDS value below the sensitivity threshold as chromosomal and all remaining contigs as plasmid. This threshold was defined to account for 95% sensitivity and computed via Monte Carlo simulations of artifical contigs resulting in an RDS=-7.9. -> use this mode to exclude chromosomal contigs.

Specificity

In the specificity mode Platon will classifiy all contigs with an RDS value above the specificity threshold as plasmid and all remaining contigs as chromosomal. This threshold was defined to account for 99.9% specificity and computed via Monte Carlo simulations of artifical contigs resulting in an RDS=0.7.

Accuracy (default)

Database

Platon depends on a custom database based on MPS, RDS, RefSeq Plasmid database, PlasmidFinder db as well as manually curated MOB HMM models from MOBscan, custom conjugation and replication HMM models and oriT sequences from MOB-suite. This database based on UniProt UniRef90 release 202 can be downloaded here: (zipped 1.6 Gb, unzipped 2.8 Gb)

https://zenodo.org/record/4066768/files/db.tar.gz

Please make sure that you use the latest Platon version along with the most recent database version! Older software versions are not compatible with the latest database version

Dependencies

Platon was developed and tested in Python 3.5 and depends on BioPython (>=1.71).

Citation

As Platon takes advantage of the inc groups, MOB HMMs and oriT sequences of the following databases, please also cite:

Issues

If you run into any issues with Platon, we’d be happy to hear about it! Please, start the pipeline with -v (verbose) and do not hesitate to file an issue including as much of the following as possible:

Platon: identification and characterization of bacterial plasmid contigs from short-read draft assemblies.

Contents

Description