Denoising (Illumina only)
=========================

Usually, amplicon sequences are clustered into **Operational Taxonomic Units**
(OTUs) using a similarity threshold of 97%, which represents the common working
definition of bacterial species. 

Another approach consists to identify the **Sequence Variants** (SVs, see
:doc:`/otu` for details). This approach avoids clustering sequences at a
predefined similarity threshold and usually includes a denoising algorithm in
order to identify SVs.

In this tutorial we show how to perform the denoising of Illumina overlapping
paired-end sequences in order to detect the SVs. Athough this tutorial explains
how to apply the pipeline to 16S paired-end Illumina reads, it can be adapted to
Illumina single-end sequening or to others markers gene/spacers, e.g. **Internal
Transcribed Spacer (ITS)**, **18S** or **28S**.

.. contents:: Table of Contents
    :local:

Data download and preprocessing
-------------------------------

In this tutorial we analyze the same dataset used in :doc:`/pairedend_97`. Reads
merging, primer trimming and quality filtering are the same as in
:doc:`/pairedend_97`:

.. code-block:: sh

    wget ftp://ftp.fmach.it/metagenomics/micca/examples/garda.tar.gz
    tar -zxvf garda.tar.gz
    cd garda

    micca mergepairs -i fastq/*_R1*.fastq -o merged.fastq -l 100 -d 30
    micca trim -i merged.fastq -o trimmed.fastq -w CCTACGGGNGGCWGCAG -r GACTACNVGGGTWTCTAATCC -W -R -c
    micca filter -i trimmed.fastq -o filtered.fasta -e 0.75 -m 400

Denoising - Sequence Variants identification
--------------------------------------------

The :doc:`/commands/otu` command implements the UNOISE3 protocol
(``denovo_unoise``) which includes dereplication, denoising and chimera
filtering:

.. code-block:: sh

    micca otu -m denovo_unoise -i filtered.fasta -o denovo_unoise_otus -t 4 -c

The :doc:`/commands/otu` command returns several files in the output directory,
including the **SV table** (``otutable.txt``) and a FASTA file containing the
**representative sequences** (``otus.fasta``).

.. Note::

    See :doc:`/otu` to see how to apply the **de novo swarm**,
    **closed-reference** and the **open-reference** OTU picking strategies to
    these data.

Further steps
-------------

* :ref:`pairedend_97-taxonomy`

* :ref:`pairedend_97-tree`

* :ref:`pairedend_97-biom`

* :doc:`/phyloseq`

* :doc:`/table`