split¶

usage: micca split [-h] -i FILE -o FILE -b FILE [-n FILE] [-c FILE] [-s N]
                [-e MAXE] [-t] [-f {fastq,fasta}]

micca split assign the multiplexed reads to samples based on their 5'
nucleotide barcode (demultiplexing) provided by the FASTA file
(--barcode). micca split creates a single FASTQ or FASTA file with
sample information (e.g. >SEQID;sample=SAMPLENAME) appended to the
sequence identifier. Barcode and the sequence preceding it is removed
by default, e.g.:

Barcode file:        Input file:

>SAMPLE1             >SEQ1
TCAGTCAG             TCAGTCAGGCCACGGCTAACTAC...
...                  ...

the output will be:

>SEQ1;sample=SAMPLE1
GCCACGGCTAACTAC...
...

optional arguments:
-h, --help            show this help message and exit

arguments:
-i FILE, --input FILE
                        input FASTQ/FASTA file (required).
-o FILE, --output FILE
                        output FASTQ/FASTA file (required).
-b FILE, --barcode FILE
                        barcode file in FASTA format (required).
-n FILE, --notmatched FILE
                        write reads in which no barcode was found.
-c FILE, --counts FILE
                        write barcode counts in a tab-delimited file.
-s N, --skip N        skip N bases before barcode matching (e.g. if your
                        sequences start with the control sequence 'TCAG'
                        followed by the barcode, set to 4) (>=0, default 0).
-e MAXE, --maxe MAXE  maximum number of allowed errors (>=0, default 1).
-t, --notrim          do not trim barcodes and the sequence preceding it
                        from sequences.
-f {fastq,fasta}, --format {fastq,fasta}
                        file format (default fastq).

Examples

Split 'reads.fastq' and write the notmatched sequences in the
file 'notmatched.fastq':

    micca split -i input.fastq -o splitted.fastq -b barcode.fasta \
    -n notmatched.fastq