Denoising (Illumina only)¶
Usually, amplicon sequences are clustered into Operational Taxonomic Units (OTUs) using a similarity threshold of 97%, which represents the common working definition of bacterial species.
Another approach consists to identify the Sequence Variants (SVs, see OTU picking and Denoising for details). This approach avoids clustering sequences at a predefined similarity threshold and usually includes a denoising algorithm in order to identify SVs.
In this tutorial we show how to perform the denoising of Illumina overlapping paired-end sequences in order to detect the SVs. Athough this tutorial explains how to apply the pipeline to 16S paired-end Illumina reads, it can be adapted to Illumina single-end sequening or to others markers gene/spacers, e.g. Internal Transcribed Spacer (ITS), 18S or 28S.
Table of Contents
Data download and preprocessing¶
In this tutorial we analyze the same dataset used in Paired-end sequencing - 97% OTU. Reads merging, primer trimming and quality filtering are the same as in Paired-end sequencing - 97% OTU:
wget ftp://ftp.fmach.it/metagenomics/micca/examples/garda.tar.gz
tar -zxvf garda.tar.gz
cd garda
micca mergepairs -i fastq/*_R1*.fastq -o merged.fastq -l 100 -d 30
micca trim -i merged.fastq -o trimmed.fastq -w CCTACGGGNGGCWGCAG -r GACTACNVGGGTWTCTAATCC -W -R -c
micca filter -i trimmed.fastq -o filtered.fasta -e 0.75 -m 400
Denoising - Sequence Variants identification¶
The otu command implements the UNOISE3 protocol
(denovo_unoise
) which includes dereplication, denoising and chimera
filtering:
micca otu -m denovo_unoise -i filtered.fasta -o denovo_unoise_otus -t 4 -c
The otu command returns several files in the output directory,
including the SV table (otutable.txt
) and a FASTA file containing the
representative sequences (otus.fasta
).
Note
See OTU picking and Denoising to see how to apply the de novo swarm, closed-reference and the open-reference OTU picking strategies to these data.