.. _manual: Manual =================== Modules available: :: nanopolish extract: extract reads in FASTA or FASTQ format from a directory of FAST5 files nanopolish call-methylation: predict genomic bases that may be methylated nanopolish variants: detect SNPs and indels with respect to a reference genome nanopolish variants --consensus: calculate an improved consensus sequence for a draft genome assembly nanopolish eventalign: align signal-level events to k-mers of a reference genome nanopolish phase-reads: Phase reads using heterozygous SNVs with respect to a reference genome nanopolish polya: Estimate polyadenylated tail lengths on native RNA reads | extract -------------------- Overview """"""""""""""""""""""" This module is used to extract reads in FASTA or FASTQ format from a directory of FAST5 files. Input """"""""""""""""""""""" * path to a directory of FAST5 files modified to contain basecall information Output """"""""""""""""""""""" * sequences of reads in FASTA or FASTQ format Usage example """"""""""""""""""""""" :: nanopolish extract [OPTIONS] .. list-table:: :widths: 20 10 20 50 :header-rows: 1 * - Argument name(s) - Required - Default value - Description * - - Y - NA - FAST5 or path to directory of FAST5 files. * - ``-r``, ``--recurse`` - N - NA - Recurse into subdirectories * - ``-q``, ``--fastq`` - N - fasta format - Use when you want to extract to FASTQ format * - ``-t``, ``--type=TYPE`` - N - 2d-or-template - The type of read either: {template, complement, 2d, 2d-or-template, any} * - ``-b``, ``--basecaller=NAME[:VERSION]`` - N - NA - consider only data produced by basecaller NAME, optionally with given exact VERSION * - ``-o``, ``--output=FILE`` - N - stdout - Write output to FILE | index -------------------- Overview """"""""""""""""""""""" Build an index mapping from basecalled reads to the signals measured by the sequencer Input """""""" * path to directory of raw nanopore sequencing data in FAST5 format * basecalled reads Output """""""" * gzipped FASTA file of basecalled reads (.index) * index files (.fai, .gzi, .readdb) Readdb file format """""""""""""""""""" Readdb file is a tab-separated file that contains two columns. One column represents read ids and the other column represents the corresponding path to FAST5 file: :: read_id_1 /path/to/fast5/containing/reads_id_1/signals read_id_2 /path/to/fast5/containing/read_id_2/signals Usage example """""""""""""" :: nanopolish index [OPTIONS] -d nanopore_raw_file_directory reads.fastq .. list-table:: :widths: 20 10 20 50 :header-rows: 1 * - Argument name(s) - Required - Default value - Description * - ``-d``, ``--directory`` - Y - NA - FAST5 or path to directory of FAST5 files containing ONT sequencing raw signal information. * - ``-f``, ``--fast5-fofn`` - N - NA - file containing the paths to each fast5 for the run call-methylation -------------------- Overview """"""""""""""""""""""" Classify nucleotides as methylated or not. Input """"""""""""""""""""""" * Basecalled ONT reads in FASTA format Output """"""""""""""""""""""" * tab-separated file containing per-read log-likelihood ratios Usage example """"""""""""""""""""""" :: nanopolish call-methylation [OPTIONS] .. list-table:: :widths: 20 10 20 50 :header-rows: 1 * - Argument name(s) - Required - Default value - Description * - ``-r``, ``--reads=FILE`` - Y - NA - the ONT reads are in fasta FILE * - ``-b``, ``--bam=FILE`` - Y - NA - the reads aligned to the genome assembly are in bam FILE * - ``-g``, ``--genome=FILE`` - Y - NA - the genome we are computing a consensus for is in FILE * - ``-t``, ``--threads=NUM`` - N - 1 - use NUM threads * - ``--progress`` - N - NA - print out a progress message variants -------------------- Overview """"""""""""""""""""""" This module is used to call single nucleotide polymorphisms (SNPs) using a signal-level HMM. Input """"""""""""""""""""""" * basecalled reads * alignment info * genome assembly Output """"""""""""""""""" * VCF file Usage example """"""""""""""""""""""" :: nanopolish variants [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa .. list-table:: :widths: 20 10 20 50 :header-rows: 1 * - Argument name(s) - Required - Default value - Description * - ``--snps`` - N - NA - use flag to only call SNPs * - ``--consensus=FILE`` - N - NA - run in consensus calling mode and write polished sequence to FILE * - ``--fix-homopolymers`` - N - NA - use flag to run the experimental homopolymer caller * - ``--faster`` - N - NA - minimize compute time while slightly reducing consensus accuracy * - ``-w``, ``--window=STR`` - N - NA - find variants in window STR (format: :-) * - ``-r``, ``--reads=FILE`` - Y - NA - the ONT reads are in fasta FILE * - ``-b``, ``--bam=FILE`` - Y - NA - the reads aligned to the reference genome are in bam FILE * - ``-e``, ``--event-bam=FILE`` - Y - NA - the events aligned to the reference genome are in bam FILE * - ``-g``, ``--genome=FILE`` - Y - NA - the reference genome is in FILE * - ``-o``, ``--outfile=FILE`` - N - stdout - write result to FILE * - ``-t``, ``--threads=NUM`` - N - 1 - use NUM threads * - ``-m``, ``--min-candidate-frequency=F`` - N - 0.2 - extract candidate variants from the aligned reads when the variant frequency is at least F * - ``-d``, ``--min-candidate-depth=D`` - N - 20 - extract candidate variants from the aligned reads when the depth is at least D * - ``-x``, ``--max-haplotypes=N`` - N - 1000 - consider at most N haplotypes combinations * - ``--max-rounds=N`` - N - 50 - perform N rounds of consensus sequence improvement * - ``-c``, ``--candidates=VCF`` - N - NA - read variants candidates from VCF, rather than discovering them from aligned reads * - ``-a``, ``--alternative-basecalls-bam=FILE`` - N - NA - if an alternative basecaller was used that does not output event annotations then use basecalled sequences from FILE. The signal-level events will still be taken from the -b bam * - ``--calculate-all-support`` - N - NA - when making a call, also calculate the support of the 3 other possible bases * - ``--models-fofn=FILE`` - N - NA - read alternatives k-mer models from FILE event align -------------------- Overview """"""""""""""""""""""" Align nanopore events to reference k-mers Input """"""""""""""""""""""" * basecalled reads * alignment information * assembled genome Usage example """"""""""""""""""""""" :: nanopolish eventalign [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa .. list-table:: :widths: 20 10 20 50 :header-rows: 1 * - Argument name(s) - Required - Default value - Description * - ``--sam`` - N - NA - use to write output in SAM format * - ``-w, --window=STR`` - N - NA - Compute the consensus for window STR (format : ctg:start_id-end_id) * - ``-r, --reads=FILE`` - Y - NA - the ONT reads are in fasta FILE * - ``-b, --bam=FILE`` - Y - NA - the reads aligned to the genome assembly are in bam FILE * - ``-g, --genome=FILE`` - Y - NA - the genome we are computing a consensus for is in FILE * - ``-t, --threads=NUM`` - N - 1 - use NUM threads * - ``--scale-events`` - N - NA - scale events to the model, rather than vice-versa * - ``--progress`` - N - NA - print out a progress message * - ``-n``, ``--print-read-names`` - N - NA - print read names instead of indexes * - ``--summary=FILE`` - N - NA - summarize the alignment of each read/strand in FILE * - ``--samples`` - N - NA - write the raw samples for the event to the tsv output * - ``--models-fofn=FILE`` - N - NA - read alternative k-mer models from FILE phase-reads - (experimental) -------------------- Overview """"""""""""""""""""""" Phase reads using heterozygous SNVs with respect to a reference genome Input """"""""""""""""""""""" * basecalled reads * alignment information * assembled genome * variants (from nanopolish variants or from other sources eg. Illumina VCF) Usage example """"""""""""""""""""""" :: nanopolish phase-reads [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa variants.vcf .. list-table:: :widths: 20 10 20 50 :header-rows: 1 * - Argument name(s) - Required - Default value - Description * - ``-v`` - N - NA - write verbose output * - ``-w, --window=STR`` - N - NA - Only phase reads in the window STR (format : ctg:start_id-end_id) * - ``-r, --reads=FILE`` - Y - NA - the ONT reads are in fasta FILE * - ``-b, --bam=FILE`` - Y - NA - the reads aligned to the genome assembly are in bam FILE * - ``-g, --genome=FILE`` - Y - NA - the genome we are computing a consensus for is in FILE * - ``variants.vcf`` - Y - NA - the variants (from nanopolish variants or Illumina in VCF format) to be phased are in FILE * - ``-t, --threads=NUM`` - N - 1 - use NUM threads * - ``--progress`` - N - NA - print out a progress message polya -------------------- Overview """"""""""""""""""""""" Estimate the number of nucleotides in the poly(A) tails of native RNA reads. Input """"""""""""""""""""""" * basecalled reads * alignment information * reference transcripts Usage example """"""""""""""""""""""" :: nanopolish polya [OPTIONS] --reads=reads.fa --bam=alignments.bam --genome=ref.fa .. list-table:: :widths: 20 10 20 50 :header-rows: 1 * - Argument name(s) - Required - Default value - Description * - ``-w, --window=STR`` - N - NA - Compute only for reads aligning to window of reference STR (format : ctg:start_id-end_id) * - ``-r, --reads=FILE`` - Y - NA - the FAST(A/Q) file of native RNA reads * - ``-b, --bam=FILE`` - Y - NA - the BAM file of alignments between reads and the reference * - ``-g, --genome=FILE`` - Y - NA - the reference transcripts * - ``-t, --threads=NUM`` - N - 1 - use NUM threads * - ``-v, -vv`` - N - NA - `-v` returns raw sample log-likelihoods, while `-vv` returns event durations