usage: micca filter [-h] -i FILE -o FILE [-e MAXEERATE] [-m MINLEN] [-t]
[-n MAXNS] [-f {fastq,fasta}]
micca filter filters sequences according to the maximum allowed
expected error (EE) rate %%. Optionally, you can:
* discard sequences that are shorter than the specified length
(suggested for Illumina overlapping paired-end (already merged)
reads) (option --minlen MINLEN);
* discard sequences that are shorter than the specified length AND
truncate sequences that are longer (suggested for Illumina and 454
unpaired reads) (options --minlen MINLEN --trunc);
* discard sequences that contain more than a specified number of Ns
(--maxns).
Sequences are first shortened and then filtered. Overlapping paired
reads with should be merged first (using micca-mergepairs) and then
filtered.
The expected error (EE) rate %% in a sequence of length L is defined
as (doi: 10.1093/bioinformatics/btv401):
sum(error probabilities)
EE rate %% = ------------------------ * 100
L
Before filtering, run 'micca filterstats' to see how many reads will
pass the filter at different minimum lengths with or without
truncation, given a maximum allowed expected error rate %% and maximum
allowed number of Ns.
micca-filter is based on VSEARCH (https://github.com/torognes/vsearch).
optional arguments:
-h, --help show this help message and exit
arguments:
-i FILE, --input FILE
input FASTQ file, Sanger/Illumina 1.8+ format
(phred+33) (required).
-o FILE, --output FILE
output FASTA/FASTQ file (required).
-e MAXEERATE, --maxeerate MAXEERATE
discard sequences with more than the specified expeced
error rate % (values <=1%, i.e. less or equal than one
error per 100 bases, are highly recommended).
Sequences are discarded after truncation (if enabled)
(default 1).
-m MINLEN, --minlen MINLEN
discard sequences that are shorter than MINLEN
(default 1).
-t, --trunc truncate sequences that are longer than MINLEN
(disabled by default).
-n MAXNS, --maxns MAXNS
discard sequences with more than the specified number
of Ns. Sequences are discarded after truncation
(disabled by default).
-f {fastq,fasta}, --output-format {fastq,fasta}
file format (default fasta).
Examples
Truncate reads at 300 bp, discard low quality sequences
(with EE rate > 0.5%%) and write a FASTA file:
micca filter -i reads.fastq -o filtered.fasta -m 300 -t -e 0.5