blast client

Program Option for Netblast (blastcl3)

Tao Tao, Ph.D.
User Service
NCBI, NLM, NIH

TOC

1. Introduction
2. Installation and setup
3. Firewall settings
4. Options and their accepted values
5. Practical usage examples
6. Trouble shooting and technical assistance
- 6.1 Errors and warnings
- 6.2 Technical assistance

NCBI BLAST web server provides a convenient and user friendly way for individuals to search their queries against different public sequence databases. This server, however, does have some limitation. For example, one will not be able to perform large scale batch searches from most of the BLAST pages and the program selection for some of the available databases is limited. BLAST client provides a way to circumvent those limitations.

The client bypasses web browser and interacts directly with the NCBI BLAST server that powers the NCBI web BLAST service (www.ncbi.nlm.nih.gov/BLAST/). It performs the batch search with multiple sequences by taking one query sequence from the input file (with multiple FASTA formatted sequences), formulating the search according to the command line, and sending the search through the internet connection to NCBI BLAST server for processing. The program receives the search result from blast server and saves it to a local file specified by the command line. The program loops through all the queries in the input file till all are searched.

This program has no graphic user interface (GUI) and must be executed from command line under a terminal window. Users control the program through command line options. Detailed list of command line options are in Section 4. For usages and situation examples, see Section 5.

2. Installation and setup

NCBI provides BLAST client as an archive separate from that of the standalone (blast initialed) or server blast (wwwblast initialed) package. This archive is available for common platforms as netblast initialed files. They can be found at:

ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/

For Linux or Unix environment, installation is straight forward. One can place the archive in a desired directory and extract the archive using the following command line:

tar zxvf netblast-##-**.tar.gz

The resulted netblast-## directory contains bin, doc, and data subdirectories. The program, blastcl3, is under the bin subdirectory. The matrices BLAST needs for protein alignments are under the data subdirectory, while the doc subdirectory contains netblast.html (this file) and firewall.html with more information on configuration under firewall settings.

The package for Windows can be extracted using WinZip. It does not have this directory structure.

3. Firewall settings

The setup for NCBI network clients has been greatly simplified. If you are not behind a firewall no further action is required. If you are behind a firewall, and already use Sequin or Entrez, or if your system administrator has already performed the setup, then you should be able to start performing searches immediately after installation. Otherwise, your will need to make sure that the following IP address/port combinations are open in the firewall configuration.

Table 3. Firewall Ports Needed by BLAST Client for NCBI Connection
IP Address	Port Number
130.14.29.112	5861
130.14.29.112	5862
130.14.29.112	5863
Note Please refer to 'firewall.html' included in the package for details.

In addition to this, you also need to create an .ncbirc file placed in the home directory to instruct blastcl3 how to make the connection to NCBI. For PC running Windows, the file is named ncbi.ini which should be placed under the windows directory. A sample .ncbirc file is provided in the text box below for your reference.

[NCBI]
DATA=/home/johndoe/netblast-2.2.12/data

[CONN]
FIREWALL=TRUE

[NET_SERV]
SRV_CONN_MODE=SERVICE

As an alternative to blastcl3, NCBI BLAST web server also supports URL API, which uses URL encoded command to interact with Blast.cgi directly to "Put" search requests or to "Get" search results. For details on the standard commands, please refer to the online document at:

www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html

4. Options and their accepted values

As mentioned before in Section 1, blastcl3 has no GUI and works only under a command terminal. Users execute the the program by issuing command lines, and controls the way blast search is done through options in the command line. The command line options for this program are listed here individually below. The options commonly adjusted during actual searches are: -i, - d, -p, -o, -e, -F, -u, -b, -v, -m, and -n. The first four are mandatory.

Table 4.1
Option	-p
Function	Specifies which program to run
Default	None, mandatory
Input Format	String
Example	To run blastn program use: -p blastn
Note	Program string options and type of search they specify
	Program	Query	DB
	blastn	nucleotide	nucleotide
	blastp	protein	protein
	blastx	nucleotide, translated	Protein
	tblastn	protein	nucleotide, translated
	tblastx	nucleotide, translated	nucleotide, translated

Table 4.2
Option	-d
Function	Specifies database(s) to be searched
Default	nr
Input Format	String
Example	One can search against multiple databases can be specified in command line. To search nr and est at together use: -d "nr est"
Note	Be conservative. Search against large databases may not complete due to CPU time limit, which is set at one hour.

Table 4.3
Option	-i
Function	Specifies input query file
Default	stdin
Input Format	String, mandatory
Example	To use sequences from query.txt as query, use -i query.txt
Note	One should use the complete file name WITH its extension. To use stdin default, omit the -i and redirect using: < mito.txt

Table 4.4
Option	-e
Function	Specifies Expect value cutoff
Default	10
Input Format	Real
Example	To make the search more stringent, one can use: -e 0.001
Note	Accepted formats are integer, fraction, decimal, exponential and scientific notation. To set the cutoff to 2×10-20, use -e 2e-20

Table 4.5
Option	-m
Function	Specifies alignment view option
Default	0
Input Format	Integer
Example	To display the result in XML form use: -m 7
Note	Option values and the output formats they specify
	0	Pairwise
	1	query-anchored showing identities
	2	query-anchored no identities
	3	flat query-anchored, show identities
	4	flat query-anchored, no identities
	5	query-anchored no identities and blunt ends
	6	flat query-anchored, no identities and blunt ends
	7	XML Blast output
	8	tabular (not post processing)
	9	tabular with comment lines (post-processed, sorted)
	10	ASN, text
	11	ASN, binary

Table 4.6
Option	-o
Function	Specifies result output file
Default	stdout (print to screen)
Input Format	String [file name]
Example	To save result in out.txt use: -o out.txt
Note	-p, -i, -d, -o are the core options needed for a blastcl3 search.

Table 4.7
Option	-F
Function	Specifies which filter(s) to use to mask query sequence
Default	T (DUST for nucleotide, SEG for protein)
Input Format	String
Example	To filter low complexity and lookup table only, use: -F "m L"
Note	Accepted strings: T, F, D, L, R, V, S, C, and m. m in -F stands for masking for lookup table only, which enables blast to display the masked region in the alignment. L stands for Low complexity, D stands for DUST. R stands for human Repeats, V stands for Vector. S stands for SEG, which has other user specifiable values: -F "S 10 1.0 1.5" SEG filter: window=10; low cut=1; high cut=1.5. C stands for COIL, which also has user specifiable values: -F "C 28 40 32" COIL filter: window=22; cutoff=40; linker=32. To run SEG and COIL filter together, use: -F "S; C" To mask lookup table only, add m: -F m "S; C" To mask repeat sequences use: -F R or -F "m R" To combine all together, use: -F "m L;R" To mask vector filter, use: -F V To call rodent repeat filter, use: -F "R -d rodent.lib"

Table 4.8
Option	-G
Function	Cost to open a gap
Default	0
Input Format	[Integer]
Example	To increase the gap open penalty to 10, use: -G 10
Note	Zero invokes default (5) for blastn. It varies for blastp, blastx, tblastn, and tblastx. In protein searches, only a controlled set of -G/-E value pairs are acceptable for a given scoring matrix.

Table 4.9
Option	-E
Function	Cost to extend a gap
Default	0
Input Format	[Integer]
Example	To increase the gap extension penalty to 4, use: -E 4
Note	Zero invokes default or 2 for blastn. Varies for blastp, blastx, tblastn, and tblastx. In protein searches, only a controlled set of -G/-E value pairs are acceptable for a given scoring matrix.

Table 4.10
Option	-X
Function	X dropoff value for gapped alignment (in bits)
Default	0
Input Format	[Integer]
Example	To increase the gapped alignment dropoff to 40, use: -X 40
Note	Gapped Alignment Dropoff Default Setting (in bits)
	Program	blastn	megablast	tblastx	others
	Value	30	20	0	15

Table 4.11
Option	-I (capital i)
Function	Show GI in definition line
Default	F
Input Format	[T/F]
Example	To activate the GI display use: -I T
Note	Sample display: T: gi\|223046\|prf\|\|0410468A... F: prf\|\|0410468A...

Table 4.12
Option	-q
Function	Penalty for a nucleotide mismatch
Default	-3
Input Format	[Integer]
Example	To set penalty to -2, use: -q -2
Note	For blastn only, different -r/-q ratios are optimal for aligning sequences with different percentage of similarities.

Table 4.13
Option	-r
Function	Reward for a nucleotide match
Default	1
Input Format	[Integer]
Example	To increase the reward to 2, use: -r 2
Note	For blastn only. Others use external scoring matrix to determine this. See -M table in blastall for more details.

Table 4.14
Option	-v
Function	Number of database sequences to show one-line descriptions for
Default	500
Input Format	[Integer]
Example	To increase the descriptions displayed to 1000 use: -v 1000
Note	Web counterpart is "Descriptions"

Table 4.15
Option	-b
Function	Number of sequences with alignments to show
Default	[Integer]
Input Format	250
Example	To increase the alignment displayed to 1000 use: -b 1000
Note	Upper limit is 200000. Web counterpart: "Alignments". This is NOT the total number of alignment segments or high scoring pairs (HSPs). Rather it is the number of database sequences with HSP(s) to the query.

Table 4.16
Option	-f
Function	Threshold for extending hits
Default	0
Input Format	Integer
Example	To increase this threshold to 15, use: -f 15
Note	Default if set to zero, not used by blastn or megablast. Extension Threshold Default Settings
	Program	blastp	blasn	blastx	tblastn	tblastx	megablast
	Value	11	0	12	13	13	0

Table 4.17
Option	-g
Function	Perform gapped alignment
Default	T
Input Format	[T/F]
Example	To do only ungapped alignment, use: -g F
Note	Default is gapped alignment, not available with tblastx.

Table 4.18
Option	-Q
Function	Query genetic code to use
Default	1
Input Format	[Integer]
Example	To set the genetic code (translation table) to 14, use: -Q 14
Note	This determines which translation table to use on query in translated blastx and tblastx searches. Default is universal codon.

Table 4.19
Option	-D
Function	DB Genetic code
Default	1
Input Format	[Integer]
Example	To set the genetic code (translation table) to 14, use: -D 14
Note	Determines which translation table to use for the database in tblastn and tblastx search. See details at: www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c

Table 4.20
Option	-a
Function	Number of processors to use
Default	1
Input Format	[Integer]
Example	To change this to two CPUs, use: -a 2
Note	From 1 up to the number of CPUs available.

Table 4.21
Option	-O
Function	To save SeqAlign object
Default	N/A
Input Format	String [File Out]
Example	To save SeqAlign object to blast_seqalign, use: -O blast_seqalign
Note	User can use the output to reformat the result into different format using NCBI toolkit function. See /blast/demo/ subdirectory for more information.

Table 4.22
Option	-J
Function	Believe the query definition line
Default	F
Input Format	[T/F]
Example	To set this to true, use: -J T
Note	Default set to false since query file definition lines may not follow NCBI convention.

Table 4.23
Option	-M
Function	Protein scoring matrix to use
Default	BLOSUM62
Input Format	[String]
Example	To change this to PAM30, use: -M PAM30
Note	Accepted value: BLOSUM45, BLOSUM62, BLOSUM80, PAM30, or PAM70.

Table 4.24
Option	-W
Function	Word size
Default	0
Input Format	[Integer]
Example	To set word size to 32, use: -W 32
Note	Word size setting for different programs
	Program	blastn	megablast	all others
	Value	11	28	3

Table 4.25
Option	-z
Function	Effective length of the database
Default	0
Input Format	[Real]
Example	To set this to 10000000, use: -z 10000000
Note	Use zero for the actual database size.

Table 4.26
Option	-K
Function	Number of best hits from a region to keep
Default	0
Input Format	[Integer]
Example	To keep 200 hits, use: -K 200
Note	This selects the specified number of best hits for a given region of the query for further evaluation. Off by default, 100 recommended if used.

Table 4.27
Option	-P
Function	Use multiple hit
Default	0
Input Format	Integer
Example	To do single hit, use: -P 1
Note	Zero is for multiple hit, 1 for single hit. Not applicable to blastn.

Table 4.28
Option	-Y
Function	Effective length of the search space
Default	0
Input Format	[Real]
Example	To set this to 10000000, use: -Y 10000000
Note	This is the product of effective query length and effective database length - actual length corrected for edge effects. Use zero for actual size.

Table 4.29
Option	-S
Function	Strands of the nucleotide query to use in the search
Default	3
Input Format	[Integer]
Example	To search with the reverse complement strand only, use: -S 2
Note	-S Input Code And Meaning for blastn, blastx, and tblastx.
	Meaning	Input	Reverse complement	Both
	Value	1	2	3

Table 4.30
Option	-T
Function	Produce HTML output
Default	F
Input Format	[T/F]
Example	To generate HTML formatted output, use: -T T
Note	With -T T, if the database is from NCBI, BLAST will hot link matched subject sequences to their actual entries in Entrez.

Table 4.31
Option	-u
Function	Restrict search of database to the subset satisfying the query
Default	N/A
Input Format	[Entrez Term] in quotes
Example	To restrict entries to mRNA use: -u "biomol_mrna[prop]"
Note	Argument is a set of Entrez query terms. BLAST server will use the terms to retrieve a list of GI numbers and restrict the BLAST search to entries specified by the list. Make sure valid terms are used. For example, it does not make sense to restrict a search to genomic sequences while searching against the est database. For details, see Entrez Help

Table 4.32
Option	-U
Function	Use lower case filtering of FASTA sequence
Default	F
Input Format	[T/F]
Example	To turn lowercase filter on, use: -U T
Note	Make sure that the query sequences are in UPPERCASE and only the filtered portions are in lowercase.

Table 4.33
Option	-y
Function	X dropoff value for ungapped extensions (in bits)
Default	0
Input Format	[Real]
Example	To increase the dropoff to 25, use: -y 25
Note	Default setting for ungapped alignment X dropoff (-y, in bits)
	Program	blastn	megablast	others
	Value	20	10	7

Table 4.34
Option	-Z
Function	X dropoff value for final gapped alignment (in bits)
Default	0
Input Format	[Integer]
Example	To increase this dropoff to 60, use: -Z 60
Note	Large dropoff value settings may help generate longer alignment. Default setting for ungapped alignment X dropoff (-Z, in bits)
	Program	blastn	megablast	tblastx	all others
	Value	50	50	25	0

Table 4.35
Option	-R
Function	Run rpsblast search
Default	F
Input Format	[T/F]
Example	To run rpsblast search, use: -R T
Note	Performs rpsblast search against CDD database. Requires an appropriate -d input. See "Remote Accessible BLAST Databases" for more information.

Table 4.36
Option	-n
Function	Enable megablast search
Default	F
Input Format	[T/F]
Example	To enable megablast search, use -n T
Note	Invokes megablast algorithm when set to T. -W will default to 28 and queries will be concatenated. This will help speed up the search at the expense of search sensitivities.

Table 4.37
Option	-L
Function	Location on query sequence
Default	N/A
Input Format	[String]
Example	To search with 100 to 400 of a query, use: -L "100,400"
Note	In -L "100,400", 100 is the start and 400 the end.

Table 4.38
Option	-A
Function	Multiple hits window size
Default	0
Input Format	[Integer]
Example	To increase the window size to 50, use: -A 50
Note	Default -A setting for different programs
	Program	blastn	megablast	all others
	Value	0	0	40

Table 4.39
Option	-w
Function	Frame shift penalty
Default	0 (no penalty)
Input Format	[Integer]
Example	To set OOF penalty to 10, use: -w 10
Note	Non-zero invokes OOF (Out Of Frame) algorithm for blastx.

Table 4.40
Option	-t
Function	Length of the largest intron allowed in tblastn for linking HSPs
Default	0
Input Format	[Integer]
Example	To allow linking of HSPs 10000 letter apart, use: -t 10000
Note	Zero disables linking. Otherwise, the value specified will be used.

5. Practical usage examples

Before we get into the actual use, we need to discuss the format of the input query. The only query format blastcl3 recognizes is FASTA. In this format, the query begins with a "greater than" sign (>) initialed definition line, or defline as it is commonly known. This defline contains a basic description of the sequence, such as its source, the gene it represents, or ways to identify the sequence. It is terminated by a hard return. Actual sequence immediately follows the defline in one or more lines each terminated by a hard return. Multiple query sequences should be concatenated one after another. Sample query sequences are presented below for your reference.

>gi|4557757|ref|NP_000240.1| MutL protein homolog 1
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNGTGIRK
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPK
PCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNA
STVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVY
AAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLP
GLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDIS
SGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVEDDSRKEM
TAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELF
YQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEI
DEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQ
QSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC
>gi|68348711|ref|NP_001234.2| tumor necrosis factor receptor 8
MRVLLAALGLLFLGALRAFPQDRPFEDTCHGNPSHYYDKAVRRCCYRCPMGLFPTQQCPQRPTDCRKQCE
PDYYLDEADRCTACVTCSRDDLVEKTPCAWNSSRVCECRPGMFCSTSAVNSCARCFFHSVCPAGMIVKFP
GTAQKNTVCEPASPGVSPACASPENCKEPSSGTIPQAKPTPVSPATSSASTMPVRGGTRLAQEAASKLTR
APDSPSSVGRPSSDPGLSPTQPCPEGSGDCRKQCEPDYYLDEAGRCTACVSCSRDDLVEKTPCAWNSSRT
CECRPGMICATSATNSRARCVPYPICAAETVTKPQDMAEKDTTFEAPPLGTQPDCNPTPENGEAPASTSP
TQSLLVDSQASKTLPIPTSAPVALSSTGKPVLDAGPVLFWVILVLVVVVGSSAFLLCHRRACRKRIRQKL
HLCYPVQTSQPKLELVDSRPRRSSTQLRSGASVTEPVAEERGLMSQPLMETCHSVGAAYLESLPLQDASP
AGGPSSPRDLPEPRVSTEHTNNKIEKIYIMKADTVIVGTVKAELPEGRGLAGPAEPELEEELEADHTPHY
PEQETEPPLGSCSDVMLSVEEEGKEDPLPTAASGK

Note that the file containing the query sequences has to be saved as a plain text file.

The program runs under a command or terminal window. On PC the command window can be launched using "Start ► Program ► Accessories ► Command Prompt". On Mac, the Terminal program icon usually is under the Utilities folder. Double click the grey icon will launch it.

In the terminal window, cd to the directory containing the blastcl3, then run the program from there. Type "blastcl3 -" without quotes and hit return should display the command line options on the screen. On Mac and Unix/Linux platform, type "./blastcl3 -" without quotes.

Since the list of available database has increased significantly, they are removed from this file. We will document this in a separate file at a later time.

5.1 General nucleotide searches

The primary use of nucleotide BLAST search is to identify the input query by finding if exact match(es) are present in the database. This type of search also is used to identify the genomic counterpart of an input mRNA sequence or vice versa. Sometimes it is also used to search with primer pairs to identify the annealing target and possible secondary annealing sites of the primers.

For sequences from well studied model organisms, a good approach is to search against the refseq_rna or refseq_genomic database with Entrez limit. Alternatively, search against nr with or without limit to the target organism can also offer good lead.

The following example command lines search the input query file new_seq.txt against either the refseq_rna or nr database and save the result in n_refm.out and n_nr.out, respectively.

blastcl3 -p blastn -i n_seq -p blastn -d refseq_rna -o n_refm.out
blastcl3 -p blastn -i n_seq -p blastn -d nr -o n_nr.out

We can further restrict the search to the mouse entries in those two databases by using entrez limit and to speed up the search by invoking the megablast algorithm. The following two options will accomplish that:

-u "mouse[organism]" -n T

For easy parsing of the search result, we can request that the result be returned in either XML or "Hit Table" (tabular) format using "-m 7" or "-m 9" without quotes in the command line.

Seaching a genomic DNA against nucleotide database, we should invoke the repeat filter to mask the repeat region and prevent BLAST program from being inundated by spurious hits to those regions. For human, this can be invoked by adding the following filter option to the command line:

-F "m L; R"

Rodent specific repeat filter requires different filter call:

-F "R -d rodent.lib"

Combining these together, the following command line searches the n_seq input nucleotide query file against the human subset in the refseq_genomic database with low complexity and human repeat filter and megablast algorithm. The expect value cutoff is set to 2x10^-10 and the output is saved in refg.output:

blastcl3 -i n_seq -p blastn -d refseq_genomic -u "human[orgn]" -n T -F "m L; R" -e 2e-10 -o refg.output

5.2 General protein searches

A protein BLAST search can be used to identify the input query protein or its function through matching to other known proteins and their annotation. One such database is refseq_protein. The following command line searches protein sequences in my_query against this database using blastp. The result is saved in my_output.

blatcl3 -p blastp -i my_query -d refseq_protein -o my_output

For functional analysis, direct search against cdd database is more informative. Matches from cdd search will identify the conserved functional domain(s) present in the query. Defline and annotation from these matched domains will provide a better revelation on the function of the query. The following command line does such a search against the cdd database (-d cdd) using rpsblast (-R T):

blatcl3 -p blastp -R T -i my_query -d cdd -o my_output

Specific search against pdb database can be used to identify existing structures with matching sequences useful for structure modeling purposes. We do not support PSI-BLAST or PHI-BLAST searches through blastcl3.

5.3 Translated BLAST searches

Translated searches can be very informative in revealing the possible function of the query since the search and alignment is performed at the protein level, which is more sensitive and biologically relevant.

5.3.1. blastx

This program searches a nucleotide query against a protein database. It first translates the query in all six frames and then searches those protein translations against the specified protein database. It is useful in identify the potential protein product(s) the query may encode and may even be able to provide information on the functions of the protein(s) should a good match to a well characterized protein can be found.

In the example command line below, we are searching the nucleotide sequences in my_query against refseq_protein. The results are saved in my_oputput file.

blatcl3 -p blastx -i my_query -d refseq_protein -o my_output

5.3.2 tblastn

This program function searches an input query protein sequence against a target nucleotide database to find other potential protein sequences that might be encoded by those nucleotide sequences. It is a good way to find out yet unidentified homolog/paralog of a give protein query. During the search, the nucleotide database entries are first translated in all six frames. The query protein is then compared against those potential products to identify the matches.

Example given below searches the input protein query file my_query against est_human database to try to identify human est entries that may encode proteins similar to the query. The result is saved to my_output:

blastcl3 -p tblastn -i my_query -d est_human -o my_output

5.3.3. tblastx

This program function compares all six-frame translations of an input query nucleotide against those from a nucleotide database. Since this search is very computationally expensive, we strongly recommend that you use it with caution, employ an higher search stringency, and limit the search to a smaller more specific subset of the database using entrez limit.

The following command line searches the my_query against the human genomic entries in nt database. The result is saved in my_output.

blastcl3 -p tblastx -i my_query -d nt -u "human[orgn] AND biomol_genomic[prop]" -o my_output

Due to the heavy computation intensity, we also recommend that users set up local standalone blast to performing such searches if the search volume is large and/or the need is regular.

5.4 Genome BLAST searches

Genome BLAST pages collect the genomic sequences and other sequences specific to an organism in one place for easy access. In addition, the matches from searching these databases often contain links to the graphic display on the Genome Mapviewer for that organims. Those organism specific genomic and other sequence databases are also available for search using blastcl3 with one major difference - there will be no link to the Map Viewer.

5.4.1 Microbial Genomes and Other Eukaryote Genomes

Depending on the status of the genome, they can be finished with accompying protein data, wgs with accompanying protein data, or wgs without accompanying protein data. The database naming convention is "Microbial/Taxid". The example command line below searches the protein database for E.coli K-12 strain:

blastcl3 -p blastp -i my_query -d Microbial/83333 -o my_output

NCBI is terminating the support for BLAST searching unfinished microbial genomic sequences through the microbial genome blast page. The recommended way is to blast against the wgs database since most of their genomic sequences are submitted to NCBI as wgs entries. The following command line example searches the wgs entries of Bacillus anthracis:

blastcl3 -p blastn -d wgs -i my_query -o my_output -u "bacillus anthracis[orgn] AND wgs[prop]"

5.4.2 Higher Genomes

Higher genomes related databases are grouped according to orgamisms, each group has it own unique database prefix. The genome assemblies are build-specific and they are updated when now assemblies are made available. For example, the human genome database and other human specific databases have the "hs_genome/" prefix. The following example command line searches against all the available human genome assemblies:

blastcl3 -p blastp -i my_query -d hs_genome/all_contig -o my_output

The default filter in the human genome blast page is "low complexity, human repeat, and masking lookup table only". To emmulate this, we can add -F "mL;R" into the command line.

6. Trouble shooting and technical assistance

6.1 Errors and warnings

Problems encountered while using blastcl3 can be caused by firewall configuration, internet connection interruption, or NCBI server glitches, with the firewall configuration as the most common cause. A representative error message may contain "[CONN_Open] Cannot open connection", "<<< Re-establishing NETBLAST Service >>>", or something in that order.

Adding the following two lines in the .ncbirc file will increase the timeout setting and generate more informative messages that are useful in debugging the problem:

TIMEOUT=300
DEBUG_PRINTOUT=DATA

Search related errors from NCBI BLAST server typically are accompanied by RID for that search. Those RIDs should be kept and sent to NCBI blast-help for trouble-shooting.

6.2 Technical assistance

If you encounter netblast problems, please report them to blast-help alias below. We recommend that you copy the error/warning messages displayed on the screen and provide detailed command line, and other relevant information. Questions or comments on this document and on BLAST in general should also be sent to blast-help alias.

blast-help@ncbi.nlm.nih.gov

Questions on other NCBI resources should be sent to:

info@ncbi.nlm.nih.gov