Bug: bbmap: quality encoding offset for LR (GDB, preprocessing)
Usind bbmap
's parameters ignorebadquality qin=64 qout=64
for long reads when processing them appears to be wrong,
i.e. need to change it in this line and update the results.
Since this rule is used for GDB only to remove host contamination, only the LR/HY results for GDB will need to be updated.
Proof:
- quality string changed between input FASTQ and output FASTQ files: some characters replaced by
@
-
testformat2.sh
frombbmap
tools reports a quality offset of 33 but fails or prints warnings if it is not set or set to 64
Checking file format:
testformat2.sh trim=f sketch=f merge=f /scratch/users/vgalata/gdb/basecalling/lr.fastq.gz
# Warning! Changed from ASCII-33 to ASCII-64 on input 8: 56 -> 25
# Up to 641 prior reads may have been generated with incorrect qualities.
# If this is a problem you may wish to re-run with the flag 'qin=33' or 'qin=64'.
#
# The ASCII quality encoding offset (64) is not set correctly, or the reads are corrupt; quality value below -5.
# Please re-run with the flag 'qin=33', 'ignorebadquality', or '-da'.
# Problematic read number 641:
# [...]
# Offset=64
# java.lang.Exception: Aborting.
# [...]
testformat2.sh qin=33 trim=f sketch=f merge=f /scratch/users/vgalata/gdb/basecalling/lr.fastq.gz
# Format fastq
# Compression gz
# Interleaved false
# [...]
# QualOffset 33
# [...]
TODOs
-
change parameters in the rule -
remove preprocessed LR files (link lr.proc.fastq.gz
and filelr.nohost.fastq.gz
) for GDB -
re-run "preprocessing" for LR for GDB -
re-run "assembly" for GDB -
re-run "mapping" for GDB -
re-run "annotation" for GDB -
re-run "analysis" for GDB -
re-run "taxonomy" for GDB -
re-create reports -
re-create GDB extra-analysis: rgi
-
re-create GDB extra analysis: barrnap
, metaT -
re-create GDB extra-analysis: metaT ave. cov. of unique mmseqs2
proteins -
re-create metaP results -
re-create paper figures