As of version 0.2.5, pindel supports calling dispersed duplication insertions (DD). DD includes all duplications like mobile element insertions and segmental duplications as long as there exists a copy in the provided reference genome. This module of pindel focuses on dispersed duplications, tandem duplications are typically not included, but the locality of which events are called can be controlled with the –MIN_DD_MAP_DISTANCE parameter.
One can run pindel with the usual mandatory parameters and use “-q” to turn on detection of DD. Pindel then collects evidence for DD events in the form of discordant read pairs (where one end maps inside the duplicated segment and one just outside) and split reads (where one read end covers the breakpoint of the insertion on the reference).
All parameters related to DD detection:
-q/–detect_DD Flag indicating whether to detect dispersed duplications. (default: false)
/–MAX_DD_BREAKPOINT_DISTANCE Maximum distance between dispersed duplication breakpoints to assume they refer to the same event. (default: 350)
/–MAX_DISTANCE_CLUSTER_READS Maximum distance between reads for them to provide evidence for a single breakpoint for dispersed duplications. (default: 100)
/–MIN_DD_CLUSTER_SIZE Minimum number of reads needed for calling a breakpoint for dispersed duplications. (default: 3)
/–MIN_DD_BREAKPOINT_SUPPORT Minimum number of split reads for calling an exact breakpoint for dispersed duplications. (default: 3)
/–MIN_DD_MAP_DISTANCE Minimum mapping distance of read pairs for them to be considered discordant. (default: 8000)
/–DD_REPORT_DUPLICATION_READS Report discordant sequences and positions for mates of reads mapping inside dispersed duplications. (default: false)
DD events are reported in a file post-fixed with “_DD“. An example output for DD detection looks like this:
####################################################################################################
1 DD reference 7068 7069 48 19 5 19 5
# Dispersed Duplication insertion (DD) found on chromosome 'reference', breakpoint at 7068 (estimated from + strand), 7069 (estimated from - strand)
# Found 48 supporting reads, of which 19 discordant reads and 5 split reads at 5' end, 19 discordant reads and 5 split reads at 3' end.
# Supporting reads for insertion location (5' end):
# Reference: TCGCCTATCTCACGATCGCCTCAATGCACCCGACGATAGGGCTCCCGTTGACCTTCAACAGCTTCGGTGGCTACTAGATACTCtattaaagggtcattggcgaaaaggcatagttgccgagggctcatggaagccagattcttcgtagattacacgacacagttcgc
# TCGCCTATCTCACGATCGCCTCAATGCACCCGACGATAGGGCTCCCGTTGACCTTCAACAGCTTCGGTGGCTACTAGATACTCCAATCCTGGCTAATCTC (name: @read_393/2 sample: sample1)
# GCCTCAATGCACCCGACGATAGGGCTCCCGTTGACCTTCAACAGCTTCGGTGGCTACTAGATACTCCAATCCTGGCTAATCTCTCATACCGGCACCGCTC (name: @read_394/2 sample: sample1)
# GATAGGGCTCCCGTTGACCTTCAACAGCTTCGGTGGCTACTAGATACTCCAATCCTGGCTAATCTCTCATACCGGCACCGCTCTGTCGGTCGCGAAATGC (name: @read_395/2 sample: sample1)
# ACCTTCAACAGCTTCGGTGGCTACTAGATACTCCAATCCTGGCTAATCTCTCATACCGGCACCGCTCTGTCGGTCGCGAAATGCAACGCCCACGTTATGG (name: @read_396/2 sample: sample1)
# TGGCTACTAGATACTCCAATCCTGGCTAATCTCTCATACCGGCACCGCTCTGTCGGTCGCGAAATGCAACGCCCACGTTATGGTGGGAGGCTTCCGCAGC (name: @read_397/2 sample: sample1)
# Supporting reads for insertion location (3' end):
# Reference: atctcacgatcgcctcaatgcacccgacgatagggctcccgttgaccttcaacagcttcggtggctactagatactcTATTAAAGGGTCATTGGCGAAAAGGCATAGTTGCCGAGGGCTCATGGAAGCCAGATTCTTCGTAGATTACACGACACAGTTCGCCACAGC
# TCGCGGCATTTATTAAAGGGTCATTGGCGAAAAGGCATAGTTGCCGAGGGCTCATGGAAGCCAGATTCTTCGTAGATTACACGACACAGTTCGCCACAGC (name: @read_457/1 sample: sample1)
# TGTTCCCCACACAGCGCTCGCGGCATTTATTAAAGGGTCATTGGCGAAAAGGCATAGTTGCCGAGGGCTCATGGAAGCCAGATTCTTCGTAGATTACACG (name: @read_456/1 sample: sample1)
# ATAGGATTGGCTCAAACTGTTCCCCACACAGCGCTCGCGGCATTTATTAAAGGGTCATTGGCGAAAAGGCATAGTTGCCGAGGGCTCATGGAAGCCAGAT (name: @read_455/1 sample: sample1)
# ATCCAGCTGGTGTTAATATAGGATTGGCTCAAACTGTTCCCCACACAGCGCTCGCGGCATTTATTAAAGGGTCATTGGCGAAAAGGCATAGTTGCCGAGG (name: @read_454/1 sample: sample1)
# TGACCCTCTATCTCAAATCCAGCTGGTGTTAATATAGGATTGGCTCAAACTGTTCCCCACACAGCGCTCGCGGCATTTATTAAAGGGTCATTGGCGAAAA (name: @read_453/1 sample: sample1)
# All supporting sequences for this insertion (i.e. sequences that map inside the inserted element):
? ? ? @read_457/1 sample1 - TCGCGGCATT
? ? ? @read_456/1 sample1 - TGTTCCCCACACAGCGCTCGCGGCATT
? ? ? @read_455/1 sample1 - ATAGGATTGGCTCAAACTGTTCCCCACACAGCGCTCGCGGCATT
? ? ? @read_454/1 sample1 - ATCCAGCTGGTGTTAATATAGGATTGGCTCAAACTGTTCCCCACACAGCGCTCGCGGCATT
? ? ? @read_453/1 sample1 - TGACCCTCTATCTCAAATCCAGCTGGTGTTAATATAGGATTGGCTCAAACTGTTCCCCACACAGCGCTCGCGGCATT
? ? ? @read_393/2 sample1 + CAATCCTGGCTAATCTC
? ? ? @read_394/2 sample1 + CAATCCTGGCTAATCTCTCATACCGGCACCGCTC
? ? ? @read_395/2 sample1 + CAATCCTGGCTAATCTCTCATACCGGCACCGCTCTGTCGGTCGCGAAATGC
? ? ? @read_396/2 sample1 + CAATCCTGGCTAATCTCTCATACCGGCACCGCTCTGTCGGTCGCGAAATGCAACGCCCACGTTATGG
? ? ? @read_397/2 sample1 + CAATCCTGGCTAATCTCTCATACCGGCACCGCTCTGTCGGTCGCGAAATGCAACGCCCACGTTATGGTGGGAGGCTTCCGCAGC
reference 136603 - @read_452/1 sample1 - TTAATAAATGCCGCGAGCGCTGTGTGGGGAACAGTTTGAGCCAATCCTATATTAACACCAGCTGGATTTGAGATAGAGGGTCAATCGGGTGCCCTGTGAC
reference 136620 - @read_451/1 sample1 - CGCTGTGTGGGGAACAGTTTGAGCCAATCCTATATTAACACCAGCTGGATTTGAGATAGAGGGTCAATCGGGTGCCCTGTGACCCCGTAGCATGGGCATA
reference 136637 - @read_450/1 sample1 - TTTGAGCCAATCCTATATTAACACCAGCTGGATTTGAGATAGAGGGTCAATCGGGTGCCCTGTGACCCCGTAGCATGGGCATAGGTAAGCTGAGCCTCAT
reference 136654 - @read_449/1 sample1 - TTAACACCAGCTGGATTTGAGATAGAGGGTCAATCGGGTGCCCTGTGACCCCGTAGCATGGGCATAGGTAAGCTGAGCCTCATCGTCCGAACTTCCGTCA
reference 136670 - @read_448/1 sample1 - TTGAGATAGAGGGTCAATCGGGTGCCCTGTGACCCCGTAGCATGGGCATAGGTAAGCTGAGCCTCATCGTCCGAACTTCCGTCAGGATAAAGGCTGGAAG
reference 136687 - @read_447/1 sample1 - TCGGGTGCCCTGTGACCCCGTAGCATGGGCATAGGTAAGCTGAGCCTCATCGTCCGAACTTCCGTCAGGATAAAGGCTGGAAGAAGTTCAGGTTCGCTAG
reference 136704 - @read_446/1 sample1 - CCGTAGCATGGGCATAGGTAAGCTGAGCCTCATCGTCCGAACTTCCGTCAGGATAAAGGCTGGAAGAAGTTCAGGTTCGCTAGTGCGGGGAGAAGCGTTC
reference 136721 - @read_445/1 sample1 - GTAAGCTGAGCCTCATCGTCCGAACTTCCGTCAGGATAAAGGCTGGAAGAAGTTCAGGTTCGCTAGTGCGGGGAGAAGCGTTCTTCGGCCCAACTAGGAC
reference 136737 - @read_444/1 sample1 - CGTCCGAACTTCCGTCAGGATAAAGGCTGGAAGAAGTTCAGGTTCGCTAGTGCGGGGAGAAGCGTTCTTCGGCCCAACTAGGACTCCTCGTTAACTGCCG
reference 136754 - @read_443/1 sample1 - GGATAAAGGCTGGAAGAAGTTCAGGTTCGCTAGTGCGGGGAGAAGCGTTCTTCGGCCCAACTAGGACTCCTCGTTAACTGCCGTGCCTCTTTGATTTTTA
reference 136771 - @read_442/1 sample1 - AGTTCAGGTTCGCTAGTGCGGGGAGAAGCGTTCTTCGGCCCAACTAGGACTCCTCGTTAACTGCCGTGCCTCTTTGATTTTTATGACGCTGAGAGGCTCG
reference 136788 - @read_441/1 sample1 - GCGGGGAGAAGCGTTCTTCGGCCCAACTAGGACTCCTCGTTAACTGCCGTGCCTCTTTGATTTTTATGACGCTGAGAGGCTCGATGATCACTCATATGTC
reference 136804 - @read_440/1 sample1 - TTCGGCCCAACTAGGACTCCTCGTTAACTGCCGTGCCTCTTTGATTTTTATGACGCTGAGAGGCTCGATGATCACTCATATGTCCGACGTTGCCACAAGG
reference 136807 + @read_416/2 sample1 + GGCCCAACTAGGACTCCTCGTTAACTGCCGTGCCTCTTTGATTTTTATGACGCTGAGAGGCTCGATGATCACTCATATGTCCGACGTTGCCACAAGGTGG
reference 136821 - @read_439/1 sample1 - TCCTCGTTAACTGCCGTGCCTCTTTGATTTTTATGACGCTGAGAGGCTCGATGATCACTCATATGTCCGACGTTGCCACAAGGTGGCTAGATCATTTCCC
reference 136823 + @read_415/2 sample1 + CTCGTTAACTGCCGTGCCTCTTTGATTTTTATGACGCTGAGAGGCTCGATGATCACTCATATGTCCGACGTTGCCACAAGGTGGCTAGATCATTTCCCGC
reference 136838 - @read_438/1 sample1 - GCCTCTTTGATTTTTATGACGCTGAGAGGCTCGATGATCACTCATATGTCCGACGTTGCCACAAGGTGGCTAGATCATTTCCCGCACGCAGGTCATATTG
reference 136840 + @read_414/2 sample1 + CTCTTTGATTTTTATGACGCTGAGAGGCTCGATGATCACTCATATGTCCGACGTTGCCACAAGGTGGCTAGATCATTTCCCGCACGCAGGTCATATTGCA
reference 136855 - @read_437/1 sample1 - GACGCTGAGAGGCTCGATGATCACTCATATGTCCGACGTTGCCACAAGGTGGCTAGATCATTTCCCGCACGCAGGTCATATTGCATCGTGTGCCAGTAGT
reference 136857 + @read_413/2 sample1 + CGCTGAGAGGCTCGATGATCACTCATATGTCCGACGTTGCCACAAGGTGGCTAGATCATTTCCCGCACGCAGGTCATATTGCATCGTGTGCCAGTAGTGT
reference 136871 - @read_436/1 sample1 - ATGATCACTCATATGTCCGACGTTGCCACAAGGTGGCTAGATCATTTCCCGCACGCAGGTCATATTGCATCGTGTGCCAGTAGTGTGGCGTATGGCTCGC
reference 136874 + @read_412/2 sample1 + ATCACTCATATGTCCGACGTTGCCACAAGGTGGCTAGATCATTTCCCGCACGCAGGTCATATTGCATCGTGTGCCAGTAGTGTGGCGTATGGCTCGCTTC
reference 136888 - @read_435/1 sample1 - CGACGTTGCCACAAGGTGGCTAGATCATTTCCCGCACGCAGGTCATATTGCATCGTGTGCCAGTAGTGTGGCGTATGGCTCGCTTCAGGCCTGAGCAAGC
reference 136890 + @read_411/2 sample1 + ACGTTGCCACAAGGTGGCTAGATCATTTCCCGCACGCAGGTCATATTGCATCGTGTGCCAGTAGTGTGGCGTATGGCTCGCTTCAGGCCTGAGCAAGCCG
reference 136905 - @read_434/1 sample1 - GGCTAGATCATTTCCCGCACGCAGGTCATATTGCATCGTGTGCCAGTAGTGTGGCGTATGGCTCGCTTCAGGCCTGAGCAAGCCGAGCACCGTCACAATC
reference 136907 + @read_410/2 sample1 + CTAGATCATTTCCCGCACGCAGGTCATATTGCATCGTGTGCCAGTAGTGTGGCGTATGGCTCGCTTCAGGCCTGAGCAAGCCGAGCACCGTCACAATCAA
reference 136924 + @read_409/2 sample1 + CGCAGGTCATATTGCATCGTGTGCCAGTAGTGTGGCGTATGGCTCGCTTCAGGCCTGAGCAAGCCGAGCACCGTCACAATCAATTGCAGTACAAAATTCG
reference 136941 + @read_408/2 sample1 + CGTGTGCCAGTAGTGTGGCGTATGGCTCGCTTCAGGCCTGAGCAAGCCGAGCACCGTCACAATCAATTGCAGTACAAAATTCGTGACCGGTCGTCGTATC
reference 136957 + @read_407/2 sample1 + GGCGTATGGCTCGCTTCAGGCCTGAGCAAGCCGAGCACCGTCACAATCAATTGCAGTACAAAATTCGTGACCGGTCGTCGTATCACATGGAGCTGTAATG
reference 136974 + @read_406/2 sample1 + AGGCCTGAGCAAGCCGAGCACCGTCACAATCAATTGCAGTACAAAATTCGTGACCGGTCGTCGTATCACATGGAGCTGTAATGAGCCGAATCGGTAGCAG
reference 136991 + @read_405/2 sample1 + GCACCGTCACAATCAATTGCAGTACAAAATTCGTGACCGGTCGTCGTATCACATGGAGCTGTAATGAGCCGAATCGGTAGCAGTAGCGCTATCCAGGGTC
reference 137008 + @read_404/2 sample1 + TGCAGTACAAAATTCGTGACCGGTCGTCGTATCACATGGAGCTGTAATGAGCCGAATCGGTAGCAGTAGCGCTATCCAGGGTCTCAGACGACCCCACAAC
reference 137024 + @read_403/2 sample1 + TGACCGGTCGTCGTATCACATGGAGCTGTAATGAGCCGAATCGGTAGCAGTAGCGCTATCCAGGGTCTCAGACGACCCCACAACACTCAACGACGACTGA
reference 137041 + @read_402/2 sample1 + ACATGGAGCTGTAATGAGCCGAATCGGTAGCAGTAGCGCTATCCAGGGTCTCAGACGACCCCACAACACTCAACGACGACTGATGCTGCGGAAGCCTCCC
reference 137058 + @read_401/2 sample1 + GCCGAATCGGTAGCAGTAGCGCTATCCAGGGTCTCAGACGACCCCACAACACTCAACGACGACTGATGCTGCGGAAGCCTCCCACCATAACGTGGGCGTT
reference 137075 + @read_400/2 sample1 + AGCGCTATCCAGGGTCTCAGACGACCCCACAACACTCAACGACGACTGATGCTGCGGAAGCCTCCCACCATAACGTGGGCGTTGCATTTCGCGACCGACA
reference 137091 + @read_399/2 sample1 + TCAGACGACCCCACAACACTCAACGACGACTGATGCTGCGGAAGCCTCCCACCATAACGTGGGCGTTGCATTTCGCGACCGACAGAGCGGTGCCGGTATG
reference 137108 + @read_398/2 sample1 + ACTCAACGACGACTGATGCTGCGGAAGCCTCCCACCATAACGTGGGCGTTGCATTTCGCGACCGACAGAGCGGTGCCGGTATGAGAGATTAGCCAGGATT
DD calls are separated with a string of hash characters (#). Each DD call starts with a line of tab-separated values summarizing the event. The values are as follows (in this order):
The next few lines for each event are prefixed with a hash character (#) and show the event summary in a human readable way. If possible, the supporting split reads are also shown here aligned to the local reference sequence depicting the possible exact breakpoint position of the event.
Finally a number of lines per event are printed that give information on the read ends that (partly) map the duplicated segment’s sequence. These are the following tab-separated values: