Preprocessing: kneaddata for reads filtering
Feature request
I would propose to consider to use kneaddata
for reads filtering.
This tool aims to perform principled in silico separation of bacterial reads from these "contaminant" reads, be they from the host, from bacterial 16S sequences, or other user-defined sources.
- can be installed via
conda
- can use multiple references for filtering
- outputs reads mapped to each given reference in separate FASTQ files
- (runs
fastqc
for the input/output FASTQ files)
The rRNA filtering step could be included there as well or it could still be a separate rule. With or without the rRNA filtering, this would reduce the code complexity considerably: there would be no need for those "chained" FASTQ files with multiple filtering-suffixes in their names.
The trimming step included in kneaddata
can and has to be skipped because of the optional poly-G trimming which has to be done prior to filtering.
kneaddata
: