Manual

Modules available:

nanopolish extract: extract reads in FASTA or FASTQ format from a directory of FAST5 files
nanopolish call-methylation: predict genomic bases that may be methylated
nanopolish variants: detect SNPs and indels with respect to a reference genome
nanopolish variants --consensus: calculate an improved consensus sequence for a draft genome assembly
nanopolish eventalign: align signal-level events to k-mers of a reference genome
nanopolish phase-reads: Phase reads using heterozygous SNVs with respect to a reference genome
nanopolish polya: Estimate polyadenylated tail lengths on native RNA reads

extract

Overview

This module is used to extract reads in FASTA or FASTQ format from a directory of FAST5 files.

Input

  • path to a directory of FAST5 files modified to contain basecall information

Output

  • sequences of reads in FASTA or FASTQ format

Usage example

nanopolish extract [OPTIONS] <fast5|dir>

Argument name(s)

Required

Default value

Description

<fast5|dir>

Y

NA

FAST5 or path to directory of FAST5 files.

-r, --recurse

N

NA

Recurse into subdirectories

-q, --fastq

N

fasta format

Use when you want to extract to FASTQ format

-t, --type=TYPE

N

2d-or-template

The type of read either: {template, complement, 2d, 2d-or-template, any}

-b, --basecaller=NAME[:VERSION]

N

NA

consider only data produced by basecaller NAME, optionally with given exact VERSION

-o, --output=FILE

N

stdout

Write output to FILE


index

Overview

Build an index mapping from basecalled reads to the signals measured by the sequencer

Input

  • path to directory of raw nanopore sequencing data in FAST5 format

  • basecalled reads

Output

  • gzipped FASTA file of basecalled reads (.index)

  • index files (.fai, .gzi, .readdb)

Readdb file format

Readdb file is a tab-separated file that contains two columns. One column represents read ids and the other column represents the corresponding path to FAST5 file:

read_id_1   /path/to/fast5/containing/reads_id_1/signals
read_id_2   /path/to/fast5/containing/read_id_2/signals

Usage example

nanopolish index [OPTIONS] -d nanopore_raw_file_directory reads.fastq

Argument name(s)

Required

Default value

Description

-d, --directory

Y

NA

FAST5 or path to directory of FAST5 files containing ONT sequencing raw signal information.

-f, --fast5-fofn

N

NA

file containing the paths to each fast5 for the run

call-methylation

Overview

Classify nucleotides as methylated or not.

Input

  • Basecalled ONT reads in FASTA format

Output

  • tab-separated file containing per-read log-likelihood ratios

Usage example

nanopolish call-methylation [OPTIONS] <fast5|dir>

Argument name(s)

Required

Default value

Description

-r, --reads=FILE

Y

NA

the ONT reads are in fasta FILE

-b, --bam=FILE

Y

NA

the reads aligned to the genome assembly are in bam FILE

-g, --genome=FILE

Y

NA

the genome we are computing a consensus for is in FILE

-t, --threads=NUM

N

1

use NUM threads

--progress

N

NA

print out a progress message

variants

Overview

This module is used to call single nucleotide polymorphisms (SNPs) using a signal-level HMM.

Input

  • basecalled reads

  • alignment info

  • genome assembly

Output

  • VCF file

Usage example

nanopolish variants [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa

Argument name(s)

Required

Default value

Description

--snps

N

NA

use flag to only call SNPs

--consensus=FILE

N

NA

run in consensus calling mode and write polished sequence to FILE

--fix-homopolymers

N

NA

use flag to run the experimental homopolymer caller

--faster

N

NA

minimize compute time while slightly reducing consensus accuracy

-w, --window=STR

N

NA

find variants in window STR (format: <chromsome_name>:<start>-<end>)

-r, --reads=FILE

Y

NA

the ONT reads are in fasta FILE

-b, --bam=FILE

Y

NA

the reads aligned to the reference genome are in bam FILE

-e, --event-bam=FILE

Y

NA

the events aligned to the reference genome are in bam FILE

-g, --genome=FILE

Y

NA

the reference genome is in FILE

-o, --outfile=FILE

N

stdout

write result to FILE

-t, --threads=NUM

N

1

use NUM threads

-m, --min-candidate-frequency=F

N

0.2

extract candidate variants from the aligned reads when the variant frequency is at least F

-d, --min-candidate-depth=D

N

20

extract candidate variants from the aligned reads when the depth is at least D

-x, --max-haplotypes=N

N

1000

consider at most N haplotypes combinations

--max-rounds=N

N

50

perform N rounds of consensus sequence improvement

-c, --candidates=VCF

N

NA

read variants candidates from VCF, rather than discovering them from aligned reads

-a, --alternative-basecalls-bam=FILE

N

NA

if an alternative basecaller was used that does not output event annotations then use basecalled sequences from FILE. The signal-level events will still be taken from the -b bam

--calculate-all-support

N

NA

when making a call, also calculate the support of the 3 other possible bases

--models-fofn=FILE

N

NA

read alternatives k-mer models from FILE

event align

Overview

Align nanopore events to reference k-mers

Input

  • basecalled reads

  • alignment information

  • assembled genome

Usage example

nanopolish eventalign [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa

Argument name(s)

Required

Default value

Description

--sam

N

NA

use to write output in SAM format

-w, --window=STR

N

NA

Compute the consensus for window STR (format : ctg:start_id-end_id)

-r, --reads=FILE

Y

NA

the ONT reads are in fasta FILE

-b, --bam=FILE

Y

NA

the reads aligned to the genome assembly are in bam FILE

-g, --genome=FILE

Y

NA

the genome we are computing a consensus for is in FILE

-t, --threads=NUM

N

1

use NUM threads

--scale-events

N

NA

scale events to the model, rather than vice-versa

--progress

N

NA

print out a progress message

-n, --print-read-names

N

NA

print read names instead of indexes

--summary=FILE

N

NA

summarize the alignment of each read/strand in FILE

--samples

N

NA

write the raw samples for the event to the tsv output

--models-fofn=FILE

N

NA

read alternative k-mer models from FILE

phase-reads - (experimental)

Overview

Phase reads using heterozygous SNVs with respect to a reference genome

Input

  • basecalled reads

  • alignment information

  • assembled genome

  • variants (from nanopolish variants or from other sources eg. Illumina VCF)

Usage example

nanopolish phase-reads [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa variants.vcf

polya

Overview

Estimate the number of nucleotides in the poly(A) tails of native RNA reads.

Input

  • basecalled reads

  • alignment information

  • reference transcripts

Usage example

nanopolish polya [OPTIONS] --reads=reads.fa --bam=alignments.bam --genome=ref.fa

Argument name(s)

Required

Default value

Description

-w, --window=STR

N

NA

Compute only for reads aligning to window of reference STR (format : ctg:start_id-end_id)

-r, --reads=FILE

Y

NA

the FAST(A/Q) file of native RNA reads

-b, --bam=FILE

Y

NA

the BAM file of alignments between reads and the reference

-g, --genome=FILE

Y

NA

the reference transcripts

-t, --threads=NUM

N

1

use NUM threads

-v, -vv

N

NA

-v returns raw sample log-likelihoods, while -vv returns event durations