Picking OTUs for use in PICRUSt =============================== `PICRUSt `_ (doi: 10.1038/nbt.2676) is a software designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes. This tutorial covers how to pick OTUs from 16S rRNA sequences data to use with PICRUSt. .. Note:: Requires :ref:`singleend-quality_filtering` in :doc:`singleend` to be done and the PICRUSt software to be installed in your system. Warning: PICRUSt 1.0.0 requires the biom-format package v1.3.1 to be installed in your system (from the command line run: ``pip install biom-format==1.3.1``, for more information see http://biom-format.org/). PICRUSt requires an :ref:`otu-closed_reference` OTU table computed against the Greengenes reference (clustered at 97% identity). Download the reference database (Greengenes, version 2013/05), clustered at 97% identity: .. code-block:: sh wget ftp://ftp.fmach.it/metagenomics/micca/dbs/gg_2013_05.tar.gz tar -zxvf gg_2013_05.tar.gz Run the micca closed-reference protocol: .. code-block:: sh micca otu -m closed_ref -i filtered.fasta -o closed_ref_otus -r 97_otus.fasta -d 0.97 -t 4 cd closed_ref_otus Report the sample summary: .. code-block:: sh micca tablestats -i otutable.txt -o tablestats head tablestats/tablestats_samplesumm.txt Sample Depth NOTU NSingle Mw_03 1084 132 39 Mw_06 1387 122 27 Mw_11 1485 155 44 Mw_07 1528 150 36 Mw_01 1537 143 35 Mw_15 1565 144 35 Mw_14 1610 149 42 Mw_02 1670 143 43 Mw_12 1710 153 54 Rarefy the OTU table for the PICRUSt analysis is always a good idea (see https://groups.google.com/forum/#!topic/picrust-users/ev5uZGUIPrQ), so we will rarefy the table at 1084 sequences per sample using :doc:`commands/tablerare`: .. code-block:: sh micca tablerare -i otutable.txt -o otutable_rare.txt -d 1084 Convert the rarefied OTU table into the BIOM format replacing the OTU IDs with the original sequence IDs using the :doc:`commands/tobiom` command: .. code-block:: sh micca tobiom -i otutable_rare.txt -o tables.biom -u otuids.txt Normalize the OTU table by dividing each OTU by the known/predicted 16S copy number abundance using the PICRUSt script ``normalize_by_copy_number.py``: .. code-block:: sh normalize_by_copy_number.py -i tables.biom -o normalized_otus.biom Create the final metagenome functional predictions using the PICRUSt script ``predict_metagenomes.py``: .. code-block:: sh predict_metagenomes.py -i normalized_otus.biom -o metagenome_predictions.biom Now you can analyze the PICRUSt predicted metagenome as described in http://picrust.github.io/picrust/tutorials/downstream_analysis.html#downstream-analysis-guide.