otu
===

See :doc:`/otu` for details.

.. code-block:: console

    usage: micca otu [-h] -i FILE [-o DIR] [-r FILE]
                    [-m {denovo_greedy,denovo_unoise,denovo_swarm,closed_ref,open_ref}]
                    [-d ID] [-n MINCOV] [-t THREADS] [-g {dgc,agc}] [-s MINSIZE]
                    [-a {both,plus}] [-c] [-S CHIM_ABSKEW]
                    [--swarm-differences SWARM_DIFFERENCES] [--swarm-fastidious]
                    [--unoise-alpha UNOISE_ALPHA]

    micca otu assigns similar sequences (marker genes such as 16S rRNA and
    the fungal ITS region) to operational taxonomic units (OTUs) or sequence 
    variants (SVs).

    Trimming the sequences to a fixed position before clustering is
    *strongly recommended* when they cover partial amplicons or if quality
    deteriorates towards the end (common when you have long amplicons and
    single-end sequencing).

    Removing ambiguous nucleotides 'N' (with the option --maxns 0 in micca
    filter) is mandatory if you use the de novo swarm clustering method.

    micca otu provides the following protocols:

    * de novo greedy clustering (denovo_greedy): useful for for the 
    identification of 97% OTUs; 

    * de novo unoise (denovo_unoise): denoise Illumina sequences using
    the UNOISE3 protocol;

    * de novo swarm (denovo_swarm): a robust and fast clustering method 
    (deprecated, it will be removed in version 1.8.0);

    * closed-reference clustering (closed_ref): sequences are clustered
    against an external reference database and reads that could not be
    matched are discarded.

    * open-reference clustering (open_ref): sequences are clustered
    against an external reference database and reads that could not be
    matched are clustered with the 'de novo greedy' protocol.

    Outputs:

    * otutable.txt: OTU x sample, TAB-separated OTU table file,
    containing the number of times an OTU is found in each sample.

    * otus.fasta: FASTA file containing the representative sequences (OTUs);

    * otuids.txt: OTU ids to original sequence ids (tab-delimited text
    file);

    * hits.txt: three-columns, TAB-separated file with matching sequence,
    representative (seed) and identity (if available, else '*');

    * otuschim.fasta (only for 'denovo_greedy', 'denovo_swarm' and
    'open_ref' when --rmchim is specified): FASTA file containing the
    chimeric otus.

    optional arguments:
    -h, --help            show this help message and exit

    arguments:
    -i FILE, --input FILE
                            input fasta file (required).
    -o DIR, --output DIR  output directory (default .).
    -r FILE, --ref FILE   reference sequences in fasta format, required for
                            'closed_ref' and 'open_ref' clustering methods.
    -m {denovo_greedy,denovo_unoise,denovo_swarm,closed_ref,open_ref}, --method {denovo_greedy,denovo_unoise,denovo_swarm,closed_ref,open_ref}
                            clustering method (default denovo_greedy)
    -d ID, --id ID        sequence identity threshold (for 'denovo_greedy',
                            'closed_ref' and 'open_ref', 0.0 to 1.0, default
                            0.97).
    -n MINCOV, --mincov MINCOV
                            reject sequence if the fraction of alignment to the
                            reference sequence is lower than MINCOV (for
                            'closed_ref' and 'open_ref' clustering methods,
                            default 0.75).
    -t THREADS, --threads THREADS
                            number of threads to use (1 to 256, default 1).
    -g {dgc,agc}, --greedy {dgc,agc}
                            greedy clustering strategy, distance (DGC) or
                            abundance-based (AGC) (for 'denovo_greedy' and
                            'open_ref' clustering methods) (default dgc).
    -s MINSIZE, --minsize MINSIZE
                            discard sequences with an abundance value smaller than
                            MINSIZE after dereplication (>=1, default values are 2
                            for 'denovo_greedy' and 'open_ref', 1 for
                            'denovo_swarm' and 8 for 'denovo_unoise').
    -a {both,plus}, --strand {both,plus}
                            search both strands or the plus strand only (for
                            'closed_ref' and 'open_ref' clustering methods,
                            default both).

    Chimera removal specific options:
    -c, --rmchim          remove chimeric sequences (ignored in method
                            'closed_ref'
    -S CHIM_ABSKEW, --chim-abskew CHIM_ABSKEW
                            abundance skew. It is used to distinguish in a three-
                            way alignment which sequence is the chimera and which
                            are the parents. If CHIM_ABSKEW=2.0, the parents
                            should be at least 2 times more abundant than their
                            chimera (defaults values are 16.0 for 'denovo_unoise',
                            2.0 otherwise).

    Swarm specific options:
    --swarm-differences SWARM_DIFFERENCES
                            maximum number of differences allowed between two
                            amplicons. Commonly used d values are 1 (linear
                            complexity algorithm), 2 or 3, rarely higher. (>=0,
                            default 1).
    --swarm-fastidious    when working with SWARM_DIFFERENCES=1, perform a
                            second clustering pass to reduce the number of small
                            OTUs (recommended option).

    UNOISE specific options:
    --unoise-alpha UNOISE_ALPHA
                            specify the alpha parameter (default 2.0).

    Examples

    De novo clustering with a 97% similarity threshold and remove
    chimeric OTUs:

        micca otu -i input.fasta --method denovo_greedy --id 0.97 -c

    Open-reference OTU picking protocol with a 97% similarity
    threshold, without removing chimeras in the de novo protocol step
    and using 8 threads:

        micca otu -i input.fasta --method open_ref --threads 8 --id 0.97 \
        --ref greengenes_2013_05/rep_set/97_otus.fasta

    De novo swarm clustering with the protocol using 4 threads:

        micca otu -i input.fasta --method denovo_swarm --threads 4 \
        --swarm-fastidious --rmchim --minsize 1