fastq_info

fastq_info is a command line program written in Bash for estimating several standard descriptive statistics from FASTQ-formatted High-Throughput Sequencing (HTS) read files. Estimated statistics per FASTQ file are:

  ▹   numbers of HTS reads and bases,

  ▹   distribution of HTS read lengths,

  ▹   nucleotide residue content per HTS read position,

  ▹   distribution of Phred scores (Ewing and Green 1998), and corresponding quartiles,

  ▹   distribution of Phred scores per HTS read position, and corresponding quartiles,

  ▹   distribution of the average Phred score per HTS read, and corresponding quartiles,

  ▹   distribution of the number of sequencing error(s) per HTS read (Edgar and Flyvbjerg 2015), and corresponding quartiles.

Different file compression formats can be handled (i.e. bzip2, DSRC 2.0, fqzcomp, gzip, quip). Several output result formats are available (e.g. txt, tsv, svg).

Since the version 3.0, fastq_info has become a Bash wrapper that embeds the program FQreport, such a C++ program being used in place of AWK to obtain faster running times.

Dependencies

You will need to install the required programs and tools listed in the following table, or to verify that they are already installed with the required version.

program version sources
gawk > 4.0.0 ftp.gnu.org/gnu/gawk
FQreport ≥ 1.0 gitlab.pasteur.fr/vlegrand/FQreport

To use fastq_info with standard FASTQ compression formats, it is also expected that the following binaries are available in the $PATH:

+gzip, required to deal with files compressed using gzip;

+bzip2, required to deal with files compressed using bzip2;

+pigz, expected to deal with files compressed using gzip on multiple threads (when not installed, gzip is used instead);

+pbzip2, expected to deal with files compressed using bzip2 on multiple threads (when not installed, bzip2 is used instead);

+dsrc, required to deal with files compressed using DSRC 2.0 RC/RC2 (Roguski and Deorowicz 2014);

+fqzcomp, required to deal with files compressed using fqzcomp 4.0 (Bonfield and Mahoney 2013);

+quip, required to deal with files compressed using QUIP (Jones et al. 2012).

To run fastq_info, it is not required to install all these binaries, but the dedicated tool(s) should be available depending on the compression format of the input files (as determined by the file extension; see Usage).

Installation and execution

Clone this repository with the following command line:

git clone https://gitlab.pasteur.fr/GIPhy/fastq_info.git

Go to the directory fastq_info/ to give the execute permission to the file fastq_info.sh:

cd fastq_info/
chmod +x fastq_info.sh

and run it with the following command line model:

./fastq_info.sh [options]

If at least one of the indicated programs (see Dependencies) is not available on your $PATH variable (or if one compiled binary has a different default name), fastq_info will either exit with an error message (when the requisite programs are missing) or not be able process some input FASTQ files (for compressed files). Of note, the list of available decompression programs can be checked using the option -c (see Usage). To set a required program that is not available on your $PATH variable, edit the file fastq_info.sh and indicate the local path to the corresponding binary(ies) within the code block REQUIREMENTS (approximately lines 70-130).

Usage

Run fastq_info without option (or with option -h) to read the following documentation:

 USAGE:  fastq_info.sh  [options]  [<file1> <file2> ...] 

 Allowed file extensions (case insensitive):
  .bz
  .bz2
  .bzip
  .bzip2 ... considered as FASTQ-formatted files compressed using bzip2;
             decompressed  using  bunzip2  or pbzip2  (when available in 
             $PATH)
  .dsrc
  .dsrc2 ... considered as  FASTQ-formatted  files compressed using DSRC 
             v2.0 (sun.aei.polsl.pl/dsrc);  decompressed using DSRC v2.0
             (when available in $PATH)
  .fastq
  .fq ...... considered as uncompressed FASTQ-formatted files

  .fqz ..... considered  as   FASTQ-formatted  files   compressed  using 
             fqzcomp  v4  (github.com/jkbonfield/fqzcomp);  decompressed 
             using fqzcomp v4 (when available in $PATH)
  .gz
  .gzip .... considered as FASTQ-formatted files  compressed using gzip;
             decompressed using gunzip or pigz (when available in $PATH)

  .qp ...... considered as  FASTQ-formatted files  compressed using QUIP
             (github.com/dcjones/quip);  decompressed  using QUIP  (when 
             available in $PATH)

 Options:
  -v <char>  output format specified by  one of the following character: 
               r   reduced table in txt format
               f   full table in txt format
               t   full table in tsv format
               s   summary in tsv format
               l   full report in svg format (landscape)
               p   full report in svg format (portrait)
             (default: r)
  -p <int>   Phred quality offset (default: 33)
  -d         DOS end-of-lines in input file(s) (default: not set)
  -t <int>   number of thread(s) for decompressing files (default: 1)
  -c         checks available tools (default: not set)
  -h         prints this help and exits

Notes

Examples

The following Bash command line enables to download the pair of gzipped FASTQ files SRR001666_1.fastq.gz and SRR001666_2.fastq.gz to be used as examples:

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR001/SRR001666/SRR001666*.fastq.gz

Basic usage

The following command line runs fastq_info.sh to analyze the second (i.e. R2) downloaded file :

fastq_info.sh  SRR001666_2.fastq.gz

leading to the following standard output:

##File: SRR001666_2.fastq.gz
#no.reads(NR): 7047668
#no.bases(NB): 253716048
#avg.lgt(AL):  36.0
----------------------------------------------------------------
n      Lfreq     pA     pC     pG     pT     pN  Efreq  Q1 Q2 Q3
----- ------ ------ ------ ------ ------ ------ ------  -- -- --
0          .      .      .      .      .      .  47.10   .  .  .
1          .  27.43  24.14  23.31  25.03   0.09  26.29  40 40 40
2          .  27.21  23.34  23.77  25.55   0.13  12.44  40 40 40
3          .  25.58  24.88  24.19  25.17   0.18   5.93  40 40 40
4          .  24.62  25.17  25.17  24.82   0.22   3.01  40 40 40
5          .  24.87  24.22  25.59  25.11   0.21   1.67  40 40 40
6          .  24.69  25.45  25.39  24.23   0.25   1.00  40 40 40
7          .  24.83  24.49  25.80  24.67   0.21   0.65  40 40 40
8          .  24.93  25.01  25.05  24.81   0.21   0.44  40 40 40
9          .  24.43  25.32  25.25  24.77   0.23   0.32  40 40 40
10         .  24.83  25.13  25.50  24.30   0.25   0.24  40 40 40
11         .  24.67  25.24  25.32  24.50   0.26   0.18  40 40 40
12         .  24.43  25.55  25.42  24.38   0.21   0.14  39 40 40
13         .  24.68  25.31  25.39  24.36   0.25   0.11  35 40 40
14         .  24.62  25.33  25.30  24.50   0.26   0.09  33 40 40
15         .  24.36  25.64  25.36  24.40   0.24   0.07  31 40 40
16         .  24.51  25.49  25.45  24.31   0.24   0.05  28 40 40
17         .  24.43  25.61  25.37  24.39   0.21   0.04  25 40 40
18         .  24.19  25.75  25.42  24.25   0.40   0.02  24 40 40
19         .  24.37  25.64  25.53  24.22   0.24   0.01  22 39 40
20         .  24.27  25.68  25.46  24.34   0.25   0.01  20 35 40
21         .  24.06  25.88  25.51  24.26   0.29   0.01  19 33 40
22         .  24.25  25.71  25.59  24.21   0.25   0.01  18 31 40
23         .  24.26  25.66  25.44  24.35   0.29   0.01  16 28 40
24         .  24.11  25.85  25.54  24.24   0.26   0.01  15 26 40
25         .  24.21  25.82  25.59  24.16   0.21   0.01  13 24 40
26         .  24.13  25.89  25.45  24.33   0.20   0.01  12 22 38
27         .  23.94  26.03  25.60  24.24   0.20   0.01  12 21 36
28         .  24.04  25.94  25.60  24.20   0.22   0.01  11 19 33
29         .  23.90  25.77  25.47  24.42   0.43   0.01  10 18 31
30         .  23.68  26.19  25.54  24.32   0.28   0.01  10 17 29
31         .  23.72  25.93  25.52  24.58   0.25   0.01   9 16 28
32         .  23.66  25.83  25.41  24.84   0.26   0.01   9 15 26
33         .  23.39  26.10  25.53  24.78   0.21   0.01   8 14 24
34         .  23.37  26.10  25.54  24.78   0.21   0.02   7 13 23
35         .  23.22  25.87  25.63  25.02   0.26   0.04   6 12 22
36    100.00  22.91  25.92  25.70  25.26   0.22   0.06   6 12 21
----- ------ ------ ------ ------ ------ ------ ------  -- -- --
                                                        Q1 Q2 Q3
                                                        -- -- --
all.Phred(B)                                            18 40 40
avg.Phred(R)                                            26 30 34
no.Errors(E)                                             0  1  2
----------------------------------------------------------------

In the first part of the outputted table, for each value n (varying from 0 to the largest observed HTS read length), the corresponding row indicates the percentage of HTS reads of length being equal to n (column Lfreq), the percentage of nucleotide residues A, C, G, T and N at position n (columns pA, pC, pG, pT, and pN, respectively), the percentage of HTS reads with n expected sequencing error(s) (column Efreq), and the 1st, 2nd and 3rd quartiles of observed Phred scores at position n (columns Q1, Q2 and Q3, respectively). The bottom part of the table summarizes the distribution of the Phred scores (first row all.Phred(B): three quartiles Q1, Q2 and Q3), the distribution of the average Phred score per HTS read (middle row avg.Phred(R): three quartiles Q1, Q2 and Q3), and the distribution of the (expected) number of sequencing error(s) per HTS read (last row no.Errors(E)).

The above example therefore shows that the majority of Phred scores are decreasing below Q = 20 at positions 28-36 (i.e. the median Phred score Q2 is lower than 20 as of HTS read position 28). At least 25% of all sequenced bases are associated to Phred scores < 19 (i.e. first quartile Q1 = 18 in row B), but at least 50% of the HTS reads have an average Phred score > 29 (median Q2 = 30 in row R). However, at least 2 sequencing errors are expected within 25% of the HTS reads (third quartile Q3 = 2 in last row E).

Advanced usage

For more details (i.e. one supplementary column for each observed Phred score Q), a full table can be outputted using options -v f:

fastq_info.sh  -v f  SRR001666_2.fastq.gz
##File: SRR001666_2.fastq.gz
#no.reads(NR): 7047668
#no.bases(NB): 253716048
#avg.lgt(AL):  36.0
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
n      Lfreq     pA     pC     pG     pT     pN  Efreq  Q1 Q2 Q3 Q=   0     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25    26    27    28    29    30    31    32    33    34    35    36    37    38    39    40
----- ------ ------ ------ ------ ------ ------ ------  -- -- -- ------ ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
0          .      .      .      .      .      .  47.10   .  .  .      .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
1          .  27.43  24.14  23.31  25.03   0.09  26.29  40 40 40    0.1   0.0   0.0   0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.2   0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2  94.2
2          .  27.21  23.34  23.77  25.55   0.13  12.44  40 40 40    0.1   0.0   0.0   0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.2   0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.1   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2  94.2
3          .  25.58  24.88  24.19  25.17   0.18   5.93  40 40 40    0.2   0.1   0.0   0.1   0.1   0.2   0.1   0.1   0.1   0.1   0.2   0.1   0.1   0.1   0.1   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3  92.3
4          .  24.62  25.17  25.17  24.82   0.22   3.01  40 40 40    0.2   0.1   0.0   0.1   0.1   0.2   0.1   0.1   0.1   0.1   0.2   0.1   0.1   0.1   0.1   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3  92.3
5          .  24.87  24.22  25.59  25.11   0.21   1.67  40 40 40    0.2   0.1   0.0   0.1   0.1   0.2   0.1   0.1   0.1   0.1   0.3   0.1   0.1   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3  91.5
6          .  24.69  25.45  25.39  24.23   0.25   1.00  40 40 40    0.2   0.1   0.0   0.1   0.1   0.2   0.1   0.1   0.1   0.1   0.3   0.1   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.2   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.4   0.4   0.4   0.4   0.4   0.4  90.6
7          .  24.83  24.49  25.80  24.67   0.21   0.65  40 40 40    0.2   0.1   0.0   0.1   0.2   0.2   0.1   0.1   0.1   0.2   0.4   0.2   0.2   0.2   0.2   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.4   0.4   0.4   0.4   0.4   0.4   0.4   0.4   0.4   0.5   0.5   0.5   0.5   0.5   0.5   0.5   0.5  87.4
8          .  24.93  25.01  25.05  24.81   0.21   0.44  40 40 40    0.2   0.1   0.0   0.1   0.2   0.2   0.1   0.1   0.1   0.2   0.4   0.2   0.2   0.2   0.3   0.3   0.3   0.3   0.3   0.3   0.3   0.4   0.4   0.4   0.4   0.4   0.4   0.4   0.4   0.5   0.5   0.5   0.5   0.5   0.5   0.5   0.5   0.5   0.5   0.5  86.8
9          .  24.43  25.32  25.25  24.77   0.23   0.32  40 40 40    0.2   0.1   0.1   0.1   0.2   0.3   0.2   0.2   0.2   0.2   0.5   0.3   0.3   0.3   0.3   0.4   0.4   0.4   0.4   0.4   0.5   0.5   0.5   0.5   0.5   0.5   0.6   0.6   0.6   0.6   0.6   0.6   0.6   0.6   0.7   0.7   0.7   0.7   0.7   0.7  82.7
10         .  24.83  25.13  25.50  24.30   0.25   0.24  40 40 40    0.2   0.1   0.1   0.1   0.3   0.3   0.2   0.2   0.2   0.2   0.6   0.3   0.3   0.4   0.4   0.4   0.4   0.5   0.5   0.5   0.5   0.5   0.6   0.6   0.6   0.6   0.6   0.6   0.7   0.7   0.7   0.7   0.7   0.7   0.7   0.7   0.7   0.8   0.8   0.8  80.5
11         .  24.67  25.24  25.32  24.50   0.26   0.18  40 40 40    0.3   0.1   0.1   0.2   0.3   0.3   0.2   0.2   0.2   0.3   0.6   0.3   0.4   0.4   0.4   0.5   0.5   0.5   0.5   0.6   0.6   0.6   0.6   0.6   0.7   0.7   0.7   0.7   0.7   0.7   0.7   0.8   0.8   0.8   0.8   0.8   0.8   0.8   0.8   0.8  78.6
12         .  24.43  25.55  25.42  24.38   0.21   0.14  39 40 40    0.2   0.1   0.1   0.2   0.3   0.4   0.2   0.3   0.3   0.3   0.8   0.4   0.5   0.5   0.5   0.6   0.6   0.6   0.7   0.7   0.7   0.7   0.8   0.8   0.8   0.8   0.8   0.9   0.9   0.9   0.9   0.9   0.9   0.9   0.9   0.9   0.9   0.9   1.0   1.0  74.4
13         .  24.68  25.31  25.39  24.36   0.25   0.11  35 40 40    0.3   0.2   0.1   0.2   0.3   0.4   0.3   0.3   0.3   0.4   0.8   0.5   0.5   0.6   0.6   0.7   0.7   0.7   0.8   0.8   0.8   0.9   0.9   0.9   1.0   1.0   1.0   1.0   1.0   1.1   1.1   1.1   1.1   1.1   1.1   1.1   1.1   1.1   1.1   1.1  70.0
14         .  24.62  25.33  25.30  24.50   0.26   0.09  33 40 40    0.3   0.2   0.1   0.2   0.4   0.5   0.3   0.3   0.4   0.4   0.9   0.5   0.6   0.6   0.7   0.7   0.8   0.8   0.9   0.9   0.9   1.0   1.0   1.0   1.0   1.1   1.1   1.1   1.1   1.1   1.1   1.1   1.1   1.2   1.2   1.1   1.1   1.1   1.1   1.1  67.8
15         .  24.36  25.64  25.36  24.40   0.24   0.07  31 40 40    0.2   0.2   0.1   0.2   0.4   0.5   0.3   0.4   0.4   0.5   1.1   0.6   0.7   0.7   0.8   0.8   0.9   0.9   1.0   1.0   1.0   1.1   1.1   1.1   1.1   1.2   1.2   1.2   1.2   1.2   1.2   1.2   1.2   1.2   1.2   1.2   1.2   1.2   1.2   1.2  64.8
16         .  24.51  25.49  25.45  24.31   0.24   0.05  28 40 40    0.2   0.3   0.1   0.3   0.5   0.7   0.4   0.5   0.5   0.6   1.3   0.8   0.8   0.9   0.9   1.0   1.0   1.1   1.1   1.2   1.2   1.2   1.3   1.3   1.3   1.3   1.3   1.3   1.3   1.3   1.3   1.3   1.3   1.3   1.3   1.3   1.3   1.3   1.3   1.2  59.8
17         .  24.43  25.61  25.37  24.39   0.21   0.04  25 40 40    0.2   0.3   0.2   0.3   0.6   0.8   0.5   0.5   0.6   0.7   1.6   0.9   1.0   1.1   1.1   1.2   1.2   1.3   1.3   1.4   1.4   1.4   1.4   1.4   1.5   1.5   1.5   1.5   1.5   1.5   1.4   1.4   1.4   1.4   1.4   1.4   1.4   1.3   1.3   1.3  54.9
18         .  24.19  25.75  25.42  24.25   0.40   0.02  24 40 40    0.4   0.3   0.2   0.4   0.6   0.8   0.5   0.6   0.7   0.7   1.7   1.0   1.1   1.2   1.2   1.3   1.3   1.4   1.4   1.5   1.5   1.5   1.5   1.5   1.5   1.6   1.6   1.6   1.5   1.5   1.5   1.5   1.5   1.5   1.4   1.4   1.4   1.4   1.3   1.3  52.2
19         .  24.37  25.64  25.53  24.22   0.24   0.01  22 39 40    0.2   0.3   0.2   0.4   0.7   1.0   0.6   0.7   0.8   0.9   2.0   1.1   1.2   1.3   1.4   1.4   1.5   1.5   1.6   1.6   1.6   1.6   1.7   1.7   1.7   1.7   1.7   1.6   1.6   1.6   1.6   1.6   1.5   1.5   1.5   1.4   1.4   1.4   1.3   1.3  48.8
20         .  24.27  25.68  25.46  24.34   0.25   0.01  20 35 40    0.3   0.4   0.2   0.5   0.9   1.2   0.7   0.8   0.9   1.0   2.4   1.3   1.4   1.5   1.6   1.6   1.7   1.7   1.8   1.8   1.8   1.8   1.8   1.8   1.8   1.8   1.8   1.7   1.7   1.7   1.6   1.6   1.6   1.5   1.5   1.4   1.4   1.4   1.3   1.3  43.9
21         .  24.06  25.88  25.51  24.26   0.29   0.01  19 33 40    0.3   0.5   0.3   0.6   1.0   1.3   0.8   0.9   1.0   1.1   2.6   1.5   1.6   1.7   1.7   1.8   1.8   1.9   1.9   1.9   1.9   1.9   1.9   1.9   1.9   1.8   1.8   1.8   1.8   1.7   1.7   1.6   1.6   1.5   1.5   1.4   1.4   1.4   1.3   1.3  40.7
22         .  24.25  25.71  25.59  24.21   0.25   0.01  18 31 40    0.3   0.5   0.3   0.7   1.1   1.4   0.9   1.0   1.2   1.3   3.0   1.7   1.7   1.8   1.9   2.0   2.0   2.0   2.1   2.1   2.1   2.0   2.0   2.0   2.0   1.9   1.9   1.8   1.8   1.7   1.7   1.6   1.6   1.5   1.5   1.4   1.4   1.3   1.3   1.2  37.3
23         .  24.26  25.66  25.44  24.35   0.29   0.01  16 28 40    0.3   0.6   0.4   0.8   1.3   1.7   1.0   1.2   1.3   1.5   3.4   1.9   2.0   2.1   2.1   2.2   2.2   2.2   2.2   2.2   2.2   2.2   2.1   2.1   2.0   2.0   1.9   1.9   1.8   1.7   1.7   1.6   1.6   1.5   1.4   1.4   1.3   1.3   1.2   1.2  33.3
24         .  24.11  25.85  25.54  24.24   0.26   0.01  15 26 40    0.3   0.7   0.4   0.9   1.4   1.9   1.2   1.4   1.6   1.7   3.9   2.1   2.3   2.3   2.4   2.4   2.4   2.4   2.4   2.4   2.3   2.3   2.2   2.2   2.1   2.0   1.9   1.9   1.8   1.7   1.6   1.6   1.5   1.4   1.4   1.3   1.3   1.2   1.1   1.1  29.4
25         .  24.21  25.82  25.59  24.16   0.21   0.01  13 24 40    0.2   0.9   0.5   1.0   1.7   2.3   1.4   1.6   1.8   2.0   4.4   2.4   2.5   2.5   2.6   2.6   2.6   2.6   2.5   2.5   2.4   2.3   2.3   2.2   2.1   2.0   1.9   1.8   1.8   1.7   1.6   1.5   1.5   1.4   1.3   1.2   1.2   1.1   1.1   1.0  26.0
26         .  24.13  25.89  25.45  24.33   0.20   0.01  12 22 38    0.2   1.0   0.6   1.2   2.0   2.6   1.6   1.8   2.0   2.2   4.9   2.6   2.7   2.7   2.7   2.7   2.7   2.7   2.6   2.5   2.4   2.4   2.3   2.2   2.1   2.0   1.9   1.8   1.7   1.6   1.5   1.5   1.4   1.3   1.2   1.2   1.1   1.1   1.0   1.0  23.4
27         .  23.94  26.03  25.60  24.24   0.20   0.01  12 21 36    0.2   1.2   0.7   1.4   2.2   2.9   1.8   2.0   2.2   2.4   5.2   2.8   2.8   2.9   2.8   2.8   2.8   2.7   2.6   2.5   2.5   2.4   2.3   2.1   2.0   1.9   1.8   1.8   1.6   1.6   1.5   1.4   1.3   1.3   1.2   1.1   1.1   1.0   0.9   0.9  21.3
28         .  24.04  25.94  25.60  24.20   0.22   0.01  11 19 33    0.2   1.4   0.8   1.6   2.5   3.3   2.0   2.2   2.4   2.6   5.7   3.0   3.0   3.0   3.0   2.9   2.9   2.8   2.7   2.6   2.4   2.3   2.2   2.1   2.0   1.9   1.8   1.7   1.6   1.5   1.4   1.3   1.2   1.2   1.1   1.0   1.0   0.9   0.9   0.8  19.0
29         .  23.90  25.77  25.47  24.42   0.43   0.01  10 18 31    0.4   1.6   0.9   1.8   2.8   3.6   2.2   2.4   2.6   2.8   6.0   3.1   3.1   3.1   3.1   3.0   2.9   2.8   2.7   2.6   2.4   2.3   2.2   2.1   1.9   1.8   1.7   1.6   1.5   1.4   1.3   1.3   1.2   1.1   1.0   1.0   0.9   0.9   0.8   0.8  17.3
30         .  23.68  26.19  25.54  24.32   0.28   0.01  10 17 29    0.3   1.8   1.1   2.1   3.2   4.1   2.4   2.7   2.9   3.1   6.4   3.3   3.3   3.2   3.1   3.0   2.9   2.8   2.7   2.5   2.4   2.2   2.1   2.0   1.8   1.7   1.6   1.5   1.4   1.3   1.3   1.2   1.1   1.0   1.0   0.9   0.8   0.8   0.8   0.7  15.4
31         .  23.72  25.93  25.52  24.58   0.25   0.01   9 16 28    0.2   2.1   1.3   2.4   3.5   4.6   2.7   2.9   3.1   3.3   6.8   3.4   3.4   3.3   3.2   3.0   2.9   2.8   2.6   2.5   2.3   2.2   2.0   1.9   1.8   1.6   1.5   1.4   1.3   1.3   1.2   1.1   1.0   1.0   0.9   0.8   0.8   0.7   0.7   0.7  13.8
32         .  23.66  25.83  25.41  24.84   0.26   0.01   9 15 26    0.3   2.5   1.5   2.6   3.9   5.0   2.9   3.1   3.3   3.4   7.1   3.5   3.4   3.3   3.2   3.1   2.9   2.7   2.6   2.4   2.2   2.1   1.9   1.8   1.7   1.6   1.4   1.3   1.2   1.2   1.1   1.0   0.9   0.9   0.8   0.8   0.7   0.7   0.6   0.6  12.6
33         .  23.39  26.10  25.53  24.78   0.21   0.01   8 14 24    0.2   2.9   1.7   3.0   4.4   5.5   3.1   3.4   3.5   3.6   7.4   3.6   3.5   3.4   3.2   3.0   2.9   2.7   2.5   2.3   2.2   2.0   1.9   1.7   1.6   1.5   1.4   1.3   1.2   1.1   1.0   0.9   0.9   0.8   0.8   0.7   0.7   0.6   0.6   0.5  11.0
34         .  23.37  26.10  25.54  24.78   0.21   0.02   7 13 23    0.2   3.3   1.9   3.3   4.8   6.0   3.4   3.6   3.8   3.8   7.6   3.7   3.6   3.4   3.2   3.0   2.8   2.6   2.4   2.2   2.1   1.9   1.8   1.6   1.5   1.4   1.3   1.2   1.1   1.0   0.9   0.9   0.8   0.7   0.7   0.6   0.6   0.6   0.5   0.5   9.7
35         .  23.22  25.87  25.63  25.02   0.26   0.04   6 12 22    0.3   3.9   2.2   3.7   5.2   6.4   3.6   3.8   3.9   4.0   7.8   3.7   3.6   3.4   3.2   2.9   2.7   2.5   2.3   2.1   2.0   1.8   1.7   1.5   1.4   1.3   1.2   1.1   1.0   0.9   0.9   0.8   0.7   0.7   0.6   0.6   0.6   0.5   0.5   0.5   8.8
36    100.00  22.91  25.92  25.70  25.26   0.22   0.06   6 12 21    0.2   4.2   2.3   4.0   5.5   6.6   3.7   3.8   3.9   3.9   7.6   3.6   3.5   3.3   3.0   2.8   2.6   2.4   2.2   2.1   1.9   1.7   1.6   1.5   1.3   1.2   1.1   1.1   1.0   0.9   0.8   0.8   0.7   0.7   0.6   0.6   0.5   0.5   0.5   0.4   9.0
----- ------ ------ ------ ------ ------ ------ ------  -- -- -- ------ ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
                                                        Q1 Q2 Q3 Q=   0     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25    26    27    28    29    30    31    32    33    34    35    36    37    38    39    40
                                                        -- -- -- ------ ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
all.Phred(B)                                            18 40 40    0.2   0.9   0.5   1.0   1.5   1.9   1.1   1.2   1.3   1.4   3.0   1.5   1.6   1.6   1.6   1.5   1.5   1.5   1.5   1.4   1.4   1.4   1.3   1.3   1.2   1.2   1.2   1.1   1.1   1.1   1.0   1.0   1.0   1.0   0.9   0.9   0.9   0.9   0.8   0.8  51.0
avg.Phred(R)                                            26 30 34    0.1   0.0   0.0   0.0   0.1   0.1   0.1   0.1   0.2   0.2   0.2   0.3   0.3   0.3   0.4   0.4   0.5   0.6   0.8   1.0   1.3   1.6   2.1   2.7   3.5   4.2   5.1   5.8   6.5   7.0   7.3   7.3   7.1   6.7   6.2   5.5   4.8   3.9   3.0   1.9   0.7
no.Errors(E)                                             0  1  2      .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Tab-delimited summary

To help with reading, the main Phred-derived statistics for all files can be summarized in tab-delimited format using option -v s:

fastq_info.sh  -v s  SRR001666*.fastq.gz
#File                NR      NB        AL    BQ1 BQ2 BQ3  RQ1 RQ2 RQ3  EQ1 EQ2 EQ3
SRR001666_1.fastq.gz 7047668 253716048 36.0  30  40  40   32  35  37   0   0   0
SRR001666_2.fastq.gz 7047668 253716048 36.0  18  40  40   26  30  34   0   1   2

This simple output format enables to easily read every file name (#File), no. HTS reads (NR) and bases (NB), average HTS read length (AL), as well as the three quartiles of the three Phred-related distributions, i.e. Phred score per base (BQ1, BQ2, BQ3), average Phred score per HTS read (RQ1, RQ2, RQ3), and expected number of sequencing error(s) per HTS read (EQ1, EQ2, EQ3).

The above example clearly shows that the overall sequencing error rate is lower in file SRR001666_1.fastq.gz than in file SRR001666_2.fastq.gz, therefore leading to many more HTS reads without sequencing error in the former FASTQ file.

Graphical reports

Graphical representation of the output results can be obtained using options -v l or -v p (landscape or portrait layouts, respectively), e.g.

fastq_info.sh  -v l  SRR001666_1.fastq.gz > SRR001666_1.svg
fastq_info.sh  -v p  SRR001666_2.fastq.gz > SRR001666_2.svg

Landscape layout leads to the following figure with SRR001666_1.fastq.gz:

Portrait layout leads to the following figure with SRR001666_2.fastq.gz:

References

Bonfield JK, Mahoney MV (2013) Compression of FASTQ and SAM format sequencing data. PLOS One, 8(3):e59190. doi:10.1371/journal.pone.0059190.

Edgar RC, Flyvbjerg H (2015) Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics, 31(21):3476-3482. doi:10.1093/bioinformatics/btv401.

Ewing D, Green P (1998) Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities. Genome Research, 8:186-194. doi:10.1101/gr.8.3.186.

Jones DC, Ruzzo WL, Peng X, Katze MG (2012) Compression of next-generation sequencing reads aided by highly efficient de novo assembly. Nucleic Acids Research, 40(22):e171–e171. doi:10.1093/nar/gks754.

Roguski L, Deorowicz S (2014) DSRC 2 - Industry-oriented compression of FASTQ files. Bioinformatics, 30(15):2213-2215. doi:10.1093/bioinformatics/btu208.