msa === .. code-block:: console usage: micca msa [-h] -i FILE -o FILE [-m {muscle,nast}] [--muscle-maxiters MUSCLE_MAXITERS] [--nast-template FILE] [--nast-id NAST_ID] [--nast-threads NAST_THREADS] [--nast-mincov NAST_MINCOV] [--nast-strand {both,plus}] [--nast-notaligned FILE] [--nast-hits FILE] [--nast-nofilter] [--nast-notrim] micca msa performs a multiple sequence alignment (MSA) on the input file in FASTA format. micca msa provides two approaches for MSA: * MUSCLE (doi: 10.1093/nar/gkh340). It is one of the most widely-used multiple sequence alignment software; * Nearest Alignment Space Termination (NAST) (doi: 10.1093/nar/gkl244). MICCA provides a very fast and memory efficient implementation of the NAST algorithm. The algorithm is based on VSEARCH (https://github.com/torognes/vsearch). It requires a pre-aligned database of sequences (--nast-template). For 16S data, a good template file is the Greengenes Core Set (http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/ core_set_aligned.fasta.imputed). optional arguments: -h, --help show this help message and exit arguments: -i FILE, --input FILE input FASTA file (required). -o FILE, --output FILE output MSA file in FASTA format (required). -m {muscle,nast}, --method {muscle,nast} multiple sequence alignment method (default muscle). MUSCLE specific options: --muscle-maxiters MUSCLE_MAXITERS maximum number of MUSCLE iterations. Set to 2 for a good compromise between speed and accuracy (>=1 default 16). NAST specific options: --nast-template FILE multiple sequence alignment template file in FASTA format. --nast-id NAST_ID sequence identity threshold to consider a sequence a match (0.0 to 1.0, default 0.75). --nast-threads NAST_THREADS number of threads to use (1 to 256, default 1). --nast-mincov NAST_MINCOV reject sequence if the fraction of alignment to the template sequence is lower than MINCOV. This parameter prevents low-coverage alignments at the end of the sequences (default 0.75). --nast-strand {both,plus} search both strands or the plus strand only (default both). --nast-notaligned FILE write not aligned sequences in FASTA format. --nast-hits FILE write hits on a TAB delimited file with the query sequence id, the template sequence id and the identity. --nast-nofilter do not remove positions which are gaps in every sequenceces (useful if you want to apply a Lane mask filter before the tree inference). --nast-notrim force to align the entire candidate sequence (i.e. do not trim the candidate sequence to that which is bound by the beginning and end points of of the alignment span Examples De novo MSA using MUSCLE: micca msa -i input.fasta -o msa.fasta Template-based MSA using NAST, the Greengenes alignment as template (clustered at 97% similarity) 4 threads and a sequence identity threshold of 75%: micca msa -i input.fasta -o msa.fasta -m nast --nast-threads 4 \ --nast-template greengenes_2013_05/rep_set_aligned/97_otus.fasta