ONT_pilot_gitlab issueshttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues2021-04-21T14:51:08+02:00https://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/128utils: ave_gene_cov bug2021-04-21T14:51:08+02:00Valentina Galatavalentina.galata@uni.luutils: ave_gene_cov bugFix function `ave_gene_cov` in `utils.py`: https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/blob/master/workflow/scripts/utils.py#L284
1. why using `if contig_id != ""`?: https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/...Fix function `ave_gene_cov` in `utils.py`: https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/blob/master/workflow/scripts/utils.py#L284
1. why using `if contig_id != ""`?: https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/blob/master/workflow/scripts/utils.py#L299
2. no ave. cov. for genes/proteins from the last contig because the `if` statement will not be reached after the end of file
* [x] fix the code
* [x] re-create the per-gene coverage result files
* [x] compare to prev. result files
* [x] re-run the report workflow
* [x] re-run the figure workflow
* [x] update notes
* [x] add relevant per-gene cov. files to result archive (also update its README)
* [x] re-create the result archive (update on figshare)
* [x] re-create the code archive (update on figshare)
* [x] update figures (if needed) in the manuscriptManuscript - v2Valentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/127Manuscript submission (preprint, Genome Biology)2021-04-27T13:52:09+02:00Valentina Galatavalentina.galata@uni.luManuscript submission (preprint, Genome Biology)### Finalizing submission:
* [x] manuscript: feedback from
* [x] MC
* [x] RH
* [x] BK
* [x] PW
* [x] code: update README: add missing details
* [x] code: update README: how to skip the preprocessing step for GDB
* [x] code: code...### Finalizing submission:
* [x] manuscript: feedback from
* [x] MC
* [x] RH
* [x] BK
* [x] PW
* [x] code: update README: add missing details
* [x] code: update README: how to skip the preprocessing step for GDB
* [x] code: code to create results archive w/ relevant files
* [x] code: add a tag
* [x] code: metaP data processing and credentials
* [x] raw data: submit GDB metaG data
* [x] raw data: submit GDB metaT data
* [x] raw data: submit GDB metaP data
* [x] figshare: code: create/submit code archive
* [x] figshare: results: create/submit results archive
* [x] manuscript: add data/code links
* [x] manuscript: v2 feedback from PW
* [x] manuscript: finalize (fig. res. etc.)
* [x] cover letter
### Preprint
* [x] submit to [biorxiv](https://www.biorxiv.org/)
### Submission to Genome Biology
* [x] transfer preprint to Genome BiologyManuscript - v2Susheel BusiValentina Galatavalentina.galata@uni.luSusheel Busi2021-04-30https://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/125Figure: typo in barrnap kingdom names2021-03-19T14:17:26+01:00Valentina Galatavalentina.galata@uni.luFigure: typo in barrnap kingdom namesReplace "Archea" by "Archaea" in [this script](https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/blob/master/workflow_report/scripts/const.R#L100)
* [x] fix typo
* [x] recreate figuresReplace "Archea" by "Archaea" in [this script](https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/blob/master/workflow_report/scripts/const.R#L100)
* [x] fix typo
* [x] recreate figuresManuscript - v2Valentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/124Bug: bbmap: quality encoding offset for LR (GDB, preprocessing)2021-03-24T07:22:12+01:00Valentina Galatavalentina.galata@uni.luBug: bbmap: quality encoding offset for LR (GDB, preprocessing)Usind `bbmap`'s parameters `ignorebadquality qin=64 qout=64` for long reads when processing them appears to be wrong,
i.e. need to change it in [this line](https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/blob/master/workflow/rul...Usind `bbmap`'s parameters `ignorebadquality qin=64 qout=64` for long reads when processing them appears to be wrong,
i.e. need to change it in [this line](https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/blob/master/workflow/rules/preprocessing.smk#L170) and update the results.
Since this rule is used for GDB only to remove host contamination, only the LR/HY results for GDB will need to be updated.
Proof:
- quality string changed between input FASTQ and output FASTQ files: some characters replaced by `@`
- `testformat2.sh` from `bbmap` tools reports a quality offset of 33 but fails or prints warnings if it is not set or set to 64
Checking file format:
```bash
testformat2.sh trim=f sketch=f merge=f /scratch/users/vgalata/gdb/basecalling/lr.fastq.gz
# Warning! Changed from ASCII-33 to ASCII-64 on input 8: 56 -> 25
# Up to 641 prior reads may have been generated with incorrect qualities.
# If this is a problem you may wish to re-run with the flag 'qin=33' or 'qin=64'.
#
# The ASCII quality encoding offset (64) is not set correctly, or the reads are corrupt; quality value below -5.
# Please re-run with the flag 'qin=33', 'ignorebadquality', or '-da'.
# Problematic read number 641:
# [...]
# Offset=64
# java.lang.Exception: Aborting.
# [...]
```
```bash
testformat2.sh qin=33 trim=f sketch=f merge=f /scratch/users/vgalata/gdb/basecalling/lr.fastq.gz
# Format fastq
# Compression gz
# Interleaved false
# [...]
# QualOffset 33
# [...]
```
**TODOs**
* [x] change parameters in the rule
* [x] remove preprocessed LR files (link `lr.proc.fastq.gz` and file `lr.nohost.fastq.gz`) for GDB
* [x] re-run "preprocessing" for LR for GDB
* [x] re-run "assembly" for GDB
* [x] re-run "mapping" for GDB
* [x] re-run "annotation" for GDB
* [x] re-run "analysis" for GDB
* [x] re-run "taxonomy" for GDB
* [x] re-create reports
* [x] re-create GDB extra-analysis: `rgi`
* [x] re-create GDB extra analysis: `barrnap`, metaT
* [x] re-create GDB extra-analysis: metaT ave. cov. of unique `mmseqs2` proteins
* [x] re-create metaP results
* [x] re-create paper figuresManuscript - v2Valentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/123Figure: GDB, nudged RGI hit to ARO:30004454 in Flye2021-03-02T15:05:25+01:00Valentina Galatavalentina.galata@uni.luFigure: GDB, nudged RGI hit to ARO:30004454 in FlyeFigure to show the one nudged hit to `ARO:30004454` in `Flye` in sample `GDB`.
See also notes `notes/gdb_rgi_aro3004454_flye.md`.
- metaT coverage
- sequences of ARO, Prodigal's protein, "new" proteinFigure to show the one nudged hit to `ARO:30004454` in `Flye` in sample `GDB`.
See also notes `notes/gdb_rgi_aro3004454_flye.md`.
- metaT coverage
- sequences of ARO, Prodigal's protein, "new" proteinManuscript - v2Valentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/121Data: mean metaT cov for "unique" genes/proteins (mmseqs2, GDB)2021-02-26T12:31:17+01:00Valentina Galatavalentina.galata@uni.luData: mean metaT cov for "unique" genes/proteins (mmseqs2, GDB)Collect the data: ave. metaT coverage for the genes/proteins identified as "unique" using `mmseqs2`.
> unique proteins = proteins from a cluster which contains only proteins from one assemblyCollect the data: ave. metaT coverage for the genes/proteins identified as "unique" using `mmseqs2`.
> unique proteins = proteins from a cluster which contains only proteins from one assemblyManuscript - v2Valentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/37Version info for relevant data files2021-04-20T06:48:22+02:00Valentina Galatavalentina.galata@uni.luVersion info for relevant data filesFor all relevant data files include/generate
- version tag (if applicable)
- download date (if applicable)
- `md5sum`For all relevant data files include/generate
- version tag (if applicable)
- download date (if applicable)
- `md5sum`Manuscript - v2https://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/36Deposit code at Zenodo/Figshare2021-04-20T06:47:34+02:00Valentina Galatavalentina.galata@uni.luDeposit code at Zenodo/FigshareDeposit the code at [Zenodo](https://zenodo.org/).Deposit the code at [Zenodo](https://zenodo.org/).Manuscript - v2https://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/21Figure: experimental design2020-11-16T08:26:23+01:00Susheel BusiFigure: experimental design- Beautify the `experimental design` figure
- Currently no figure exists
- @susheel.busi will make one over the next few days
- @valentina.galata can then tweak it (maybe)- Beautify the `experimental design` figure
- Currently no figure exists
- @susheel.busi will make one over the next few days
- @valentina.galata can then tweak it (maybe)Manuscript - v2Susheel BusiSusheel Busi