Preprocessing: kneaddata for reads filtering

Feature request

I would propose to consider to use kneaddata for reads filtering.

This tool aims to perform principled in silico separation of bacterial reads from these "contaminant" reads, be they from the host, from bacterial 16S sequences, or other user-defined sources.

can be installed via conda
can use multiple references for filtering
outputs reads mapped to each given reference in separate FASTQ files
(runs fastqc for the input/output FASTQ files)

The rRNA filtering step could be included there as well or it could still be a separate rule. With or without the rRNA filtering, this would reduce the code complexity considerably: there would be no need for those "chained" FASTQ files with multiple filtering-suffixes in their names.

The trimming step included in kneaddata can and has to be skipped because of the optional poly-G trimming which has to be done prior to filtering.

kneaddata:

web site
repo
tutorials
forum