otuΒΆ
See OTU picking and Denoising for details.
usage: micca otu [-h] -i FILE [-o DIR] [-r FILE]
[-m {denovo_greedy,denovo_unoise,denovo_swarm,closed_ref,open_ref}]
[-d ID] [-n MINCOV] [-t THREADS] [-g {dgc,agc}] [-s MINSIZE]
[-a {both,plus}] [-c] [-S CHIM_ABSKEW]
[--swarm-differences SWARM_DIFFERENCES] [--swarm-fastidious]
[--unoise-alpha UNOISE_ALPHA]
micca otu assigns similar sequences (marker genes such as 16S rRNA and
the fungal ITS region) to operational taxonomic units (OTUs) or sequence
variants (SVs).
Trimming the sequences to a fixed position before clustering is
*strongly recommended* when they cover partial amplicons or if quality
deteriorates towards the end (common when you have long amplicons and
single-end sequencing).
Removing ambiguous nucleotides 'N' (with the option --maxns 0 in micca
filter) is mandatory if you use the de novo swarm clustering method.
micca otu provides the following protocols:
* de novo greedy clustering (denovo_greedy): useful for for the
identification of 97% OTUs;
* de novo unoise (denovo_unoise): denoise Illumina sequences using
the UNOISE3 protocol;
* de novo swarm (denovo_swarm): a robust and fast clustering method
(deprecated, it will be removed in version 1.8.0);
* closed-reference clustering (closed_ref): sequences are clustered
against an external reference database and reads that could not be
matched are discarded.
* open-reference clustering (open_ref): sequences are clustered
against an external reference database and reads that could not be
matched are clustered with the 'de novo greedy' protocol.
Outputs:
* otutable.txt: OTU x sample, TAB-separated OTU table file,
containing the number of times an OTU is found in each sample.
* otus.fasta: FASTA file containing the representative sequences (OTUs);
* otuids.txt: OTU ids to original sequence ids (tab-delimited text
file);
* hits.txt: three-columns, TAB-separated file with matching sequence,
representative (seed) and identity (if available, else '*');
* otuschim.fasta (only for 'denovo_greedy', 'denovo_swarm' and
'open_ref' when --rmchim is specified): FASTA file containing the
chimeric otus.
optional arguments:
-h, --help show this help message and exit
arguments:
-i FILE, --input FILE
input fasta file (required).
-o DIR, --output DIR output directory (default .).
-r FILE, --ref FILE reference sequences in fasta format, required for
'closed_ref' and 'open_ref' clustering methods.
-m {denovo_greedy,denovo_unoise,denovo_swarm,closed_ref,open_ref}, --method {denovo_greedy,denovo_unoise,denovo_swarm,closed_ref,open_ref}
clustering method (default denovo_greedy)
-d ID, --id ID sequence identity threshold (for 'denovo_greedy',
'closed_ref' and 'open_ref', 0.0 to 1.0, default
0.97).
-n MINCOV, --mincov MINCOV
reject sequence if the fraction of alignment to the
reference sequence is lower than MINCOV (for
'closed_ref' and 'open_ref' clustering methods,
default 0.75).
-t THREADS, --threads THREADS
number of threads to use (1 to 256, default 1).
-g {dgc,agc}, --greedy {dgc,agc}
greedy clustering strategy, distance (DGC) or
abundance-based (AGC) (for 'denovo_greedy' and
'open_ref' clustering methods) (default dgc).
-s MINSIZE, --minsize MINSIZE
discard sequences with an abundance value smaller than
MINSIZE after dereplication (>=1, default values are 2
for 'denovo_greedy' and 'open_ref', 1 for
'denovo_swarm' and 8 for 'denovo_unoise').
-a {both,plus}, --strand {both,plus}
search both strands or the plus strand only (for
'closed_ref' and 'open_ref' clustering methods,
default both).
Chimera removal specific options:
-c, --rmchim remove chimeric sequences (ignored in method
'closed_ref'
-S CHIM_ABSKEW, --chim-abskew CHIM_ABSKEW
abundance skew. It is used to distinguish in a three-
way alignment which sequence is the chimera and which
are the parents. If CHIM_ABSKEW=2.0, the parents
should be at least 2 times more abundant than their
chimera (defaults values are 16.0 for 'denovo_unoise',
2.0 otherwise).
Swarm specific options:
--swarm-differences SWARM_DIFFERENCES
maximum number of differences allowed between two
amplicons. Commonly used d values are 1 (linear
complexity algorithm), 2 or 3, rarely higher. (>=0,
default 1).
--swarm-fastidious when working with SWARM_DIFFERENCES=1, perform a
second clustering pass to reduce the number of small
OTUs (recommended option).
UNOISE specific options:
--unoise-alpha UNOISE_ALPHA
specify the alpha parameter (default 2.0).
Examples
De novo clustering with a 97% similarity threshold and remove
chimeric OTUs:
micca otu -i input.fasta --method denovo_greedy --id 0.97 -c
Open-reference OTU picking protocol with a 97% similarity
threshold, without removing chimeras in the de novo protocol step
and using 8 threads:
micca otu -i input.fasta --method open_ref --threads 8 --id 0.97 \
--ref greengenes_2013_05/rep_set/97_otus.fasta
De novo swarm clustering with the protocol using 4 threads:
micca otu -i input.fasta --method denovo_swarm --threads 4 \
--swarm-fastidious --rmchim --minsize 1