bamPEFragmentSize¶
This tool calculates the fragment sizes for read pairs given a BAM file from paired-end sequencing.Several regions are sampled depending on the size of the genome and number of processors to estimate thesummary statistics on the fragment lengths. Properly paired reads are preferred for computation, i.e., it will only use discordant pairs if no concordant alignments overlap with a given region. The default setting simply prints the summary statistics to the screen.
usage: bamPEFragmentSize [-h] [--bamfiles bam files [bam files ...]] [--histogram FILE]
[--plotFileFormat FILETYPE] [--numberOfProcessors INT]
[--samplesLabel SAMPLESLABEL [SAMPLESLABEL ...]] [--plotTitle PLOTTITLE]
[--maxFragmentLength MAXFRAGMENTLENGTH] [--logScale] [--binSize INT]
[--distanceBetweenBins INT] [--blackListFileName BED file] [--table FILE]
[--outRawFragmentLengths FILE] [--verbose] [--version]
Named Arguments¶
- --bamfiles, -b
List of BAM files to process
- --histogram, -hist, -o
Save a .png file with a histogram of the fragment length distribution.
- --plotFileFormat
Possible choices: png, pdf, svg, eps, plotly
Image format type. If given, this option overrides the image format based on the plotFile ending. The available options are: png, eps, pdf, svg and plotly.
- --numberOfProcessors, -p
Number of processors to use. The default is to use 1. (Default: 1)
- --samplesLabel
Labels for the samples plotted. The default is to use the file name of the sample. The sample labels should be separated by spaces and quoted if a label itselfcontains a space E.g. –samplesLabel label-1 “label 2”
- --plotTitle, -T
Title of the plot, to be printed on top of the generated image. Leave blank for no title. (Default: )
- --maxFragmentLength
The maximum fragment length in the histogram. A value of 0 (the default) indicates to use twice the mean fragment length. (Default: 0)
- --logScale
Plot on the log scale
- --binSize, -bs
Length in bases of the window used to sample the genome. (Default: 1000)
- --distanceBetweenBins, -n
To reduce the computation time, not every possible genomic bin is sampled. This option allows you to set the distance between bins actually sampled from. Larger numbers are sufficient for high coverage samples, while smaller values are useful for lower coverage samples. Note that if you specify a value that results in too few (<1000) reads sampled, the value will be decreased. (Default: 1000000)
- --blackListFileName, -bl
A BED file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered.
- --table
In addition to printing read and fragment length metrics to the screen, write them to the given file in tabular format.
- --outRawFragmentLengths
Save the fragment (or read if the input is single-end) length and their associated number of occurrences to a tab-separated file. Columns are length, number of occurrences, and the sample label.
- --verbose
Set if processing data messages are wanted.
- --version
show program’s version number and exit
Example usage¶
$ deepTools2.0/bin/bamPEFragmentSize \
-hist fragmentSize.png \
-T "Fragment size of PE RNA-seq data" \
--maxFragmentLength 1000 \
-b testFiles/RNAseq_sample1.bam testFiles/RNAseq_sample2.bam \
testFiles/RNAseq_sample3.bam testFiles/RNAseq_sample4.bam \
-samplesLabel sample1 sample2 sample3 sample4
## Output
BAM file : testFiles/RNAseq_sample1.bam
Sample size: 10815
Fragment lengths:
Min.: 0.0
1st Qu.: 311.0
Mean: 8960.68987517
Median: 331.0
3rd Qu.: 362.0
Max.: 53574842.0
Std: 572421.46625
Read lengths:
Min.: 20.0
1st Qu.: 101.0
Mean: 99.1621821544
Median: 101.0
3rd Qu.: 101.0
Max.: 101.0
Std: 9.16567362755
BAM file : testFiles/RNAseq_sample2.bam
Sample size: 6771
Fragment lengths:
Min.: 43.0
1st Qu.: 148.0
Mean: 176.465071629
Median: 164.0
3rd Qu.: 185.0
Max.: 500.0
Std: 53.733877263
......(output truncated)

If the --table
option is specified, the summary statistics are additionally printed in a tabular format:
Frag. Len. Min. Frag. Len. 1st. Qu. Frag. Len. Mean Frag. Len. Median Frag. Len. 3rd Qu. Frag. Len. Max Frag. Len. Std. Read Len. Min. Read Len. 1st. Qu. Read Len. Mean Read Len. Median Read Len. 3rd Qu. Read Len. Max Read Len. Std.
bowtie2 test1.bam 241.0 241.5 244.666666667 242.0 246.5 251.0 4.49691252108 251.0 251.0 251.0 251.0 251.0 251.0 0.0
If the --outRawFragmentLengths
option is provided, another history item will be produced, containing the raw data underlying the histogram. It has the following format:
#bamPEFragmentSize
Size Occurrences Sample
241 1 bowtie2 test1.bam
242 1 bowtie2 test1.bam
251 1 bowtie2 test1.bam
The “Size” is the fragment (or read, for single-end datasets) size and “Occurrences” are the number of times reads/fragments with that length were observed. For easing downstream processing, the sample name is a lso included on each row.