Ideas/To Do =========== This is a rather unsorted list of features that would be nice to have, of things that could be improved in the source code, and of possible algorithmic improvements. - show average error rate - In colorspace and probably also for Illumina data, gapped alignment is not necessary - ``--progress`` - run pylint, pychecker - length histogram - check whether input is FASTQ although -f fasta is given - search for adapters in the order in which they are given on the command line - more tests for the alignment algorithm - deprecate ``--rest-file`` - ``--detect`` prints out best guess which of the given adapters is the correct one - alignment algorithm: make a 'banded' version - it seems the str.find optimization isn't very helpful. In any case, it should be moved into the Aligner class. - allow to remove not the adapter itself, but the sequence before or after it - instead of trimming, convert adapter to lowercase - warn when given adapter sequence contains non-IUPAC characters - try multithreading again, this time use os.pipe() or 0mq Specifying adapters ------------------- The idea is to deprecate the ``-b`` and ``-g`` parameters. Only ``-a`` is used with a special syntax for each adapter type. This makes it a bit easier to add new adapter types in the feature. .. csv-table:: back,``-a ADAPTER``,``-a ADAPTER`` or ``-a ...ADAPTER`` suffix,``-a ADAPTER$``,``-a ...ADAPTER$`` front,``-g ADAPTER``,``-a ADAPTER...`` prefix,``-g ^ADAPTER``,``-a ^ADAPTER...`` (or have anchoring by default?) anywhere,``-b ADAPTER``, ``-a ...ADAPTER...`` ??? paired,(not implemented),``-a ADAPTER...ADAPTER`` or ``-a ^ADAPTER...ADAPTER`` Or add only ``-a ADAPTER...`` as an alias for ``-g ^ADAPTER`` and ``-a ...ADAPTER`` as an alias for ``-a ADAPTER``. The ``...`` would be equivalent to ``N*`` as in regular expressions. Another idea: Allow something such as ``-a ADAP$TER`` or ``-a ADAPTER$NNN``. This would be a way to specify less strict anchoring. Make it possible to specify that the rightmost or leftmost match should be picked. Default right now: Leftmost, even for -g adapters. Allow ``N{3,10}`` as in regular expressions (for a variable-length sequence). Use parentheses to specify the part of the sequence that should be kept: * ``-a (...)ADAPTER`` (default) * ``-a (...ADAPTER)`` (default) * ``-a ADAPTER(...)`` (default) * ``-a (ADAPTER...)`` (??) Or, specify the part that should be removed: ``-a ...(ADAPTER...)`` ``-a ...ADAPTER(...)`` ``-a (ADAPTER)...`` Model somehow all the flags that exist for semiglobal alignment. For start of the adapter: * Start of adapter can be degraded or not * Bases are allowed to be before adapter or not Not degraded and no bases before allowed = anchored. Degraded and bases before allowed = regular 5' By default, the 5' end should be anchored, the 3' end not. * ``-a ADAPTER...`` → not degraded, no bases before allowed * ``-a N*ADAPTER...`` → not degraded, bases before allowed * ``-a ADAPTER^...`` → degraded, no bases before allowed * ``-a N*ADAPTER^...`` → degraded, bases before allowed * ``-a ...ADAPTER`` → degraded, bases after allowed * ``-a ...ADAPTER$`` → not degraded, no bases after allowed Paired-end trimming ------------------- * Could also use a paired-end read merger, then remove adapters with -a and -g Available/used letters for command-line options ----------------------------------------------- * Remaining characters: All uppercase letters except A, B, G, M, N, O, U * Lowercase letters: i, j, k, l, s, w * Planned/reserved: Q (paired-end quality trimming), j (multithreading)