Swan

April 17, 2015

swan-15-065

Swan

Smith-Waterman Aligment

Swan is a simple program for computing local alignments in short DNA sequences using the Smith-Waterman algorithm. It can 1) align two sequences specified on the command line 2) find the best matching sequences in a reference FASTA file for a single query sequence or 3) find the best matching sequences between a reference and a query FASTA file.

Swan allows changing of alignment parameters, dumping of the alignment matrix, and output of alternate alignments. It can be sped up by requiring matches to have at least one k-mer perfectly aligning with the -index option. Logical key-value formatted output is obtained with the --key-value option.

Description

-o <fname>

Output file name, default STDOUT.

-rs <DNA>

Reference DNA sequence.

-qs <DNA>

Query DNA sequence.

-r <DNA>

Reference DNA FASTA file.

-q <DNA>

Query DNA FASTA file.

-q-len <int>

-r-len <int>

Consider only sequences at least as long as specified, for query and reference respectively.

-q-string <string>

-r-string <string>

Consider only sequences from query or reference respectively, for which the identifier matches <string>. Only exact matches are supported - no regular expressions.

-id <num>

Display matches with at least <num> identity (in range 0-100).

-index <num>

Require stretch of <num> matches. The purpose of this is to speed up queries, and hence this will require the argument to be in the range of 8-12 in order to be useful.

-swp MATCH/SUBSTITUTION/GAP

Set the Smith-Waterman gain for matches and penalties for subtitutions and gaps.

--noindel

Do not consider alignments with gaps.

--matrix

Dump the alignment matrix.

-cell <num>

Trace the alignment from this matrix cell.

-do <num>

Process the top <num> entries from the reference file.

--key-value

Write a logical key-value ouput format.

--excise

Print only the aligned part of the sequence alignment, omit flanking sequences.