Introduction

DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. The key features are:

Pairwise alignment of proteins and translated DNA at 100x-20,000x speed of BLAST.
Frameshift alignments for long read analysis.
Low resource requirements and suitable for running on standard desktops or laptops.
Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.

Keep posted about new developments by following me on Twitter.

Quick start guide

This demonstrates a quick example for setting up and using the program on Linux. Installing the software on your system may be done by downloading it in binary format for immediate use:

wget http://github.com/bbuchfink/diamond/releases/download/v2.0.4/diamond-linux64.tar.gz
tar xzf diamond-linux64.tar.gz

The extracted diamond binary file should be moved to a directory contained in your executable search path (PATH environment variable).

To now run an alignment task, we assume to have a protein database file in FASTA format named nr.faa and a file of DNA reads that we want to align named reads.fna.

In order to set up a reference database for DIAMOND, the makedb command needs to be executed with the following command line:

$ diamond makedb --in nr.faa -d nr

This will create a binary DIAMOND database file with the specified name (nr.dmnd). The alignment task may then be initiated using the blastx command like this:

$ diamond blastx -d nr -q reads.fna -o matches.m8

The output file here is specified with the –o option and named matches.m8. By default, it is generated in BLAST tabular format.

Note:

Repeat masking is applied to the query and reference sequences by default. To disable it, use --masking 0.
DIAMOND is optimized for large input files of >1 million proteins. Naturally the tool can be used for smaller files as well, but the algorithm will not reach its full efficiency and runtime comparisons to other tools will not be meaningful.
The program may use quite a lot of memory and also temporary disk space. Should the program fail due to running out of either one, you need to set a lower value for the block size parameter -b.
You can adjust the sensitivity using the options --mid-sensitive, --sensitive, --more-sensitive, --very-sensitive and --ultra-sensitive.

Documentation

The online documentation is located at http://www.diamondsearch.org. For the markdown source code and an offline, text-based documentation, see: https://github.com/bbuchfink/diamond_docs.

About

DIAMOND is developed by Benjamin Buchfink at the Drost lab, Max Planck Institute for Developmental Biology, Tübingen, Germany.

[Email] [Twitter] [Google Scholar] [Drost lab] [MPI-EBIO]

Publication:

Buchfink B, Xie C, Huson DH, "Fast and sensitive protein alignment using DIAMOND", Nature Methods 12, 59-60 (2015). doi:10.1038/nmeth.3176