Denoising (Illumina only)

Usually, amplicon sequences are clustered into Operational Taxonomic Units (OTUs) using a similarity threshold of 97%, which represents the common working definition of bacterial species.

Another approach consists to identify the Sequence Variants (SVs, see OTU picking and Denoising for details). This approach avoids clustering sequences at a predefined similarity threshold and usually includes a denoising algorithm in order to identify SVs.

In this tutorial we show how to perform the denoising of Illumina overlapping paired-end sequences in order to detect the SVs. Athough this tutorial explains how to apply the pipeline to 16S paired-end Illumina reads, it can be adapted to Illumina single-end sequening or to others markers gene/spacers, e.g. Internal Transcribed Spacer (ITS), 18S or 28S.

Data download and preprocessing

In this tutorial we analyze the same dataset used in Paired-end sequencing - 97% OTU. Reads merging, primer trimming and quality filtering are the same as in Paired-end sequencing - 97% OTU:

wget ftp://ftp.fmach.it/metagenomics/micca/examples/garda.tar.gz
tar -zxvf garda.tar.gz
cd garda

micca mergepairs -i fastq/*_R1*.fastq -o merged.fastq -l 100 -d 30
micca trim -i merged.fastq -o trimmed.fastq -w CCTACGGGNGGCWGCAG -r GACTACNVGGGTWTCTAATCC -W -R -c
micca filter -i trimmed.fastq -o filtered.fasta -e 0.75 -m 400

Denoising - Sequence Variants identification

The otu command implements the UNOISE3 protocol (denovo_unoise) which includes dereplication, denoising and chimera filtering:

micca otu -m denovo_unoise -i filtered.fasta -o denovo_unoise_otus -t 4 -c

The otu command returns several files in the output directory, including the SV table (otutable.txt) and a FASTA file containing the representative sequences (otus.fasta).

Note

See OTU picking and Denoising to see how to apply the de novo swarm, closed-reference and the open-reference OTU picking strategies to these data.