IMP3 issueshttps://git-r3lab.uni.lu/IMP/imp3/-/issues2021-11-15T08:33:46+01:00https://git-r3lab.uni.lu/IMP/imp3/-/issues/53Refactoring2021-11-15T08:33:46+01:00pedro queirospedro.queiros@uni.luRefactoringOverhaulpedro queirospedro.queiros@uni.lupedro queirospedro.queiros@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/47Conda envs: smaller YAML files2022-02-04T14:55:27+01:00Valentina Galatavalentina.galata@uni.luConda envs: smaller YAML filesI would like to propose to have smaller `conda` env. YAML files instead of the big monolithic per-step files being currently used.
According to `snakemake`'s [documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/deploymen...I would like to propose to have smaller `conda` env. YAML files instead of the big monolithic per-step files being currently used.
According to `snakemake`'s [documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#distribution-and-reproducibility) the `conda` environments should be
> as finegrained as possible to improve transparency and maintainability.
In many cases, it makes sense to have per-rule or per-tool YAML files, e.g. for `MEGAHIT` or `metaSPAdes`.
This could be discussed and decided for each step and rule to avoid installation of the same tools in different environments.OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/45Unit testing, CI and merging2021-10-26T10:11:40+02:00Valentina Galatavalentina.galata@uni.luUnit testing, CI and mergingConsider to include
- [Snakemake unit tests](https://snakemake.readthedocs.io/en/stable/snakefiles/testing.html#snakefiles-testing) (at least for some steps/rules)
- Continuous Integration (which is currently not working)
- [Fast Forward...Consider to include
- [Snakemake unit tests](https://snakemake.readthedocs.io/en/stable/snakefiles/testing.html#snakefiles-testing) (at least for some steps/rules)
- Continuous Integration (which is currently not working)
- [Fast Forward Merge](https://docs.gitlab.com/ee/user/project/merge_requests/fast_forward_merge.html)
The current test data set might be too big for this purpose.
We might need another small and simple dummy test dataset.
*TODOs: TBD*OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/43Formatting convention2021-10-08T13:45:35+02:00Valentina Galatavalentina.galata@uni.luFormatting convention### Formatting convention
Define a formatting convention and clean up the code according to the new guidelines.
The formatting convention is defined in the [Wiki](https://git-r3lab.uni.lu/IMP/imp3/-/wikis/Code-formatting-convention).
...### Formatting convention
Define a formatting convention and clean up the code according to the new guidelines.
The formatting convention is defined in the [Wiki](https://git-r3lab.uni.lu/IMP/imp3/-/wikis/Code-formatting-convention).
To define:
- rule structure template/guidelines
- Python:
- [PEP8](https://www.python.org/dev/peps/pep-0008/)
- 4 spaces instead of tabs
- double or single quotes?
- comments: reStructered Text or NumPy/SciPy?
- settings access
- either only global variables or `config`
- if global variables, then these should be defined in one place
To try:
- code quality checker (linter): `snakemake --lint`
- highlights issues to be solved to follow best practices
- highly recommended before publishing workflows
- automatic formatter: [Snakefmt](https://github.com/snakemake/snakefmt), based on [Black](https://black.readthedocs.io/en/stable/)
References:
- [Python docstring formats](https://realpython.com/documenting-python-code/#docstring-formats)
### Code "cleaning"
- changes w.r.t. formatting conventions mentioned above
- Python code
- `snakemake` rules
- (other scripts)
- add comments where possible
- one place for all helper functions
- no variables/functions inside rule files
- use `os.path` utils when working with paths
- replace code duplication with custom functions
- consistent indentation
- Q: can we simplify the structure in `workflow/rules/` ???OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/41Configuration: configs, profiles and launchers2022-04-19T13:41:44+02:00Valentina Galatavalentina.galata@uni.luConfiguration: configs, profiles and launchersFollow the recommendations from `snakemake` and the [best practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) to update and restructure the current configuration setup:
- `snakemake` config
- move run...Follow the recommendations from `snakemake` and the [best practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) to update and restructure the current configuration setup:
- `snakemake` config
- move runtime specific configuration to `snakemake` profile
- `snakemake` profile
- to set default values for command line options
- `slurm` config
- (cluster configuration has been officially deprecated)
- incl. in `snakemake` profile
- ~~shell-variables config~~
- will be part of `snakemake` profile
- ~~launcher scripts~~
- all parameters should be given via `snakemake` profiles
### References
- [Snakemake: profiles](https://snakemake.readthedocs.io/en/stable/executing/cli.html?highlight=profile#profiles)
- [Template Snakemake profile for SLURM](https://github.com/Snakemake-Profiles/slurm)OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/38nonpareil: simplify the rule2022-02-10T07:35:22+01:00Valentina Galatavalentina.galata@uni.lunonpareil: simplify the ruleSimplify the rule `nonpareil` in [nonpareil.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Preprocessing/nonpareil.smk) and improve the formatting if possible.
For future reference:
- `nonpareil` does not take gzipp...Simplify the rule `nonpareil` in [nonpareil.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Preprocessing/nonpareil.smk) and improve the formatting if possible.
For future reference:
- `nonpareil` does not take gzipped FASTQ files: see [issue 35](https://github.com/lmrodriguezr/nonpareil/issues/35) --> might get fixed in new version
- `nonpareil` runs are not completely reproducible: see [issue 48](https://github.com/lmrodriguezr/nonpareil/issues/48) --> might get fixed in new versionOverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/35Config: use appropriate in-built data types2021-09-24T10:06:46+02:00Valentina Galatavalentina.galata@uni.luConfig: use appropriate in-built data types* [ ] use lists instead of strings if multiple values should/can be given for an attribute
- `steps`
- `input` files for MG/MT
- `summary_steps`
- `filtering`: `filter`
- `sortmerna`: `files`
- `hmm_DBs`
- `COGS`
- `binn...* [ ] use lists instead of strings if multiple values should/can be given for an attribute
- `steps`
- `input` files for MG/MT
- `summary_steps`
- `filtering`: `filter`
- `sortmerna`: `files`
- `hmm_DBs`
- `COGS`
- `binners`
* [ ] use a boolean instead of a string for `true/false` attributes
- e.g. `settingsLocked`
- edit: use `True` or `False` in the `snakemake` command when overwriting parameters with `--config`
* [ ] update/expand config validation schemaOverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/24Bug: config tmp folder is ignored during rRNA filtering2021-10-08T11:12:27+02:00Valentina Galatavalentina.galata@uni.luBug: config tmp folder is ignored during rRNA filteringThe given tmp folder path (i.e. the one specified in the `snakemake` config file) seems to be ignored by `mktemp` in the rRNA filtering rule: https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Preprocessing/rna.filtering.smk#...The given tmp folder path (i.e. the one specified in the `snakemake` config file) seems to be ignored by `mktemp` in the rRNA filtering rule: https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Preprocessing/rna.filtering.smk#L10.
Though the CMD for `mktemp` looks correct during the execution, the path being used is `/tmp` which might cause issues for big FASTQ files.
As long as this issues is not resolved, the problem can be avoided by setting the shell variable `TMPDIR` when running `snakemake`:
```bash
TMPDIR="path/to/tmp/" snakemake ...
```OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/63Mantis GFF script: decrease runtime for big samples2022-09-07T17:15:42+02:00Valentina Galatavalentina.galata@uni.luMantis GFF script: decrease runtime for big samplesSpeed up the code in [mantis_gff.py](https://gitlab.lcsb.uni.lu/IMP/imp3/-/blob/issue59/workflow/scripts/Analysis/mantis_gff.py) for big samples with many annotations.Speed up the code in [mantis_gff.py](https://gitlab.lcsb.uni.lu/IMP/imp3/-/blob/issue59/workflow/scripts/Analysis/mantis_gff.py) for big samples with many annotations.Analysis & Binning updatesValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/57Binning: Rework binning module2023-11-08T12:09:26+01:00Valentina Galatavalentina.galata@uni.luBinning: Rework binning module- Overhaul binning module
- Replace current `binny` implementation with its new version: https://github.com/a-h-b/binny- Overhaul binning module
- Replace current `binny` implementation with its new version: https://github.com/a-h-b/binnyAnalysis & Binning updatesOskar HicklOskar Hicklhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/23PathoFact: include latest version2022-02-25T14:27:44+01:00Valentina Galatavalentina.galata@uni.luPathoFact: include latest versionAn older version of `PathoFact` is included in branch [IMPiris_VG](https://git-r3lab.uni.lu/IMP/imp3/-/tree/IMPiris_VG).
But, the tool has been modified and updated for manuscript revisions.
Update the code to include the latest version...An older version of `PathoFact` is included in branch [IMPiris_VG](https://git-r3lab.uni.lu/IMP/imp3/-/tree/IMPiris_VG).
But, the tool has been modified and updated for manuscript revisions.
Update the code to include the latest version of `PathoFact` (in a separate branch).Analysis & Binning updatesValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/15MANTIS: setup and rules2022-03-02T16:08:45+01:00Valentina Galatavalentina.galata@uni.luMANTIS: setup and rules[MANTIS](https://github.com/PedroMTQ/mantis) setup
branch: https://git-r3lab.uni.lu/IMP/imp3/-/tree/mantis
* [x] `git submodule`
* [x] `conda` env.
* [x] check installation
* [x] add rules, config param.s etc.
* [x] test run (incl. bin...[MANTIS](https://github.com/PedroMTQ/mantis) setup
branch: https://git-r3lab.uni.lu/IMP/imp3/-/tree/mantis
* [x] `git submodule`
* [x] `conda` env.
* [x] check installation
* [x] add rules, config param.s etc.
* [x] test run (incl. binning)
* [x] update the [mantis branch](https://git-r3lab.uni.lu/IMP/imp3/-/tree/mantis)
* [x] switch from GitLab repo to GitHub repo
* [x] update the code (see issue #22)
* [x] databases download/setup (see issue #22)
* [x] replace `gut submodule` by `conda` installation (i.e. env. YAML file)
* [ ] replace HMM search w/ `essential` HMM by `mantis` annotation ([list of genes in Mantis](https://github.com/PedroMTQ/mantis/tree/master/Resources/essential_genes)) --> updated version: `1.3`
- *pending*: an update of `binny` might make this task obsoleteAnalysis & Binning updatespedro queirospedro.queiros@uni.luValentina Galatavalentina.galata@uni.lupedro queirospedro.queiros@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/11Assembly: merging/consensus: replace CAP32021-03-17T06:51:14+01:00Valentina Galatavalentina.galata@uni.luAssembly: merging/consensus: replace CAP3Replace `CAP3` in the the consensus assembly step:
- rule `collapse_hybrid_assemblies` in `rules/Assembly/hybrid/merge-hybrid-assembly.smk`
- rule `cap3` in in `rules/Assembly/common/merge-assembly.smk`Replace `CAP3` in the the consensus assembly step:
- rule `collapse_hybrid_assemblies` in `rules/Assembly/hybrid/merge-hybrid-assembly.smk`
- rule `cap3` in in `rules/Assembly/common/merge-assembly.smk`AssemblyValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/64IMP Test Runs2023-11-13T18:43:49+01:00Ricardo PariseIMP Test Runs- [ ] simple test running on all machines, example: bigmem
- [ ] set up tests for the different input modalities
- [ ] document this on the readthedocs- [ ] simple test running on all machines, example: bigmem
- [ ] set up tests for the different input modalities
- [ ] document this on the readthedocsRicardo PariseRicardo Parisehttps://git-r3lab.uni.lu/IMP/imp3/-/issues/56Error in rule merge_assembly_cap3 - segmentation fault2021-12-07T07:59:46+01:00javiercnavError in rule merge_assembly_cap3 - segmentation fault## Bug report
IMP3 stopped at the step of merge_assembly_cap3 without generating any content in the corresponding log file, but the screen message shown below:
Error in rule merge_assembly_cap3:
jobid: 9
output: Assembly/mg.ass...## Bug report
IMP3 stopped at the step of merge_assembly_cap3 without generating any content in the corresponding log file, but the screen message shown below:
Error in rule merge_assembly_cap3:
jobid: 9
output: Assembly/mg.assembly.merged.fa
log: logs/assembly_merge_assembly_cap3.log (check log file(s) for error message)
conda-env: /users/jcnavarro/conda/e7a8faf3f71c32a40cabb702d36f2415
shell:
NAME_fin=Assembly/mg.assembly
NAME=Assembly/intermediary/mg.assembly
cat Assembly/intermediary/mg.megahit_preprocessed.1.fa Assembly/intermediary/mg.megahit_unmapped.2.fa > $NAME.cat.fa
# Run cap3
cap3 $NAME.cat.fa -p 98 -o 100 > logs/assembly_merge_assembly_cap3.log 2>&1
# Concatenate assembled contigs, singletons and rename the contigs
cat $NAME.cat.fa.cap.contigs $NAME.cat.fa.cap.singlets | awk '/^>/{print ">T111_contig_" ++i; next}{print}' > $NAME_fin.merged.fa
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /mnt/san.ese/nfs/users/jcnavarro/.snakemake/log/2021-11-23T110427.419676.snakemake.log
### Log files and screenshots
see attachments
(List of logs/screenshots)
## Steps to reproduce
I run IMP3 with an smaller dataset and it run well.
**Code version**
```txt
commit hash
branch name
```
**Config files**
<!-- Attach used and created config files. -->
[terraces.yaml](/uploads/68a438d10a822355a43eee320154c814/terraces.yaml)
[2021-11-23T110427.419676.snakemake.log](/uploads/ac4bb93e8191ec4ab998e0065109e18e/2021-11-23T110427.419676.snakemake.log)
![Screen_shot_error](/uploads/91968b384185bdf63cce17228eebc1e6/Screen_shot_error.png)
![Screen_Shot_logs](/uploads/9054d4c57ab11de8f233f57b35bbbca6/Screen_Shot_logs.png)
[sample.config.yaml](/uploads/b82ef66a49d99e0208ecc24a0e0d8523/sample.config.yaml)
(List of config files)
**Command**
<!-- Attach used launcher script and/or provide the command below. -->
```bash
# command used to launch IMP3
snakemake -s /users/jcnavarro/IMP3/Snakefile --configfile /users/jcnavarro/playing.yaml --use-conda --conda-prefix /users/jcnavarro/conda/ --cores 20
```
**Input**
<!-- Provide relevant information about your input files. -->
R1/R2 fastq.gz for the metagenome of a soil sample. Each has a size of 20 Gb when compressed.
**System**
<!-- Short description of the system setup where you are running `IMP3`. -->
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
MemTotal: 528365004 kB
Threads/core: 64
## Possible fixes
<!-- If you can, link to the file or line of code that might be responsible for the problem - please make sure that the linked file corresponds to the indicated version of the code. -->
(Optional: How to fix)
--------------------------------------------------https://git-r3lab.uni.lu/IMP/imp3/-/issues/55Job fails due to conda environment installation2021-12-08T10:36:25+01:00michougJob fails due to conda environment installation## Bug report
<!-- Describe the bug/error you have encountered. -->
Hi,
When launching IMP3 jobs using the iris cluster and the attached files, in some cases, this error happens :
[GL100_DN.launchIMP.sh](/uploads/cb56a27e3e15b0958c4d9...## Bug report
<!-- Describe the bug/error you have encountered. -->
Hi,
When launching IMP3 jobs using the iris cluster and the attached files, in some cases, this error happens :
[GL100_DN.launchIMP.sh](/uploads/cb56a27e3e15b0958c4d9f2f46800a53/GL100_DN.launchIMP.sh)
[GL100_DN.runIMP.sh](/uploads/978beb0ffa116de85ab40228362ec25d/GL100_DN.runIMP.sh)
[GL100_DN_config.yaml](/uploads/eff8270c0d05fa76b89dbf6245123e74/GL100_DN_config.yaml)
```
/home/users/gmichoud/IMP3/.
Building DAG of jobs...
Creating conda environment /home/users/gmichoud/IMP3/workflow/rules/ini/../../envs/IMP_binning.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/users/gmichoud/IMP3/workflow/rules/ini/../../envs/IMP_binning.yaml:
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working...
####################################################################################
das_tool version 1.1.1-2 has been successfully installed!
This software by default runs with USEARCH. You can install it from the following links or use DIAMOND '--search_engine diamond'
> Download: http://www.drive5.com/usearch/download.html
> Installation instruction: http://www.drive5.com/usearch/manual/install.html
done
ERROR conda.core.link:_execute(699): An error occurred while installing package 'conda-forge::fribidi-1.0.5-h516909a_1002'.
Rolling back transaction: ...working... done
CondaError: Cannot link a source that does not exist. /home/users/gmichoud/IMP3/conda/4788f34499eae03fa997bfd1ba6c307c/.condatmp/e7595ab0-dd9c-485f-beb1-48de0bac8174
Running `conda clean --packages` may resolve your problem.
()
```
the python version I use is Python 3.9.1, installed via Miniconda.
The git branch is the latest updated (as of yesterday)
Any ideas?
Best
Greghttps://git-r3lab.uni.lu/IMP/imp3/-/issues/52Assembly/Analysis: replace perl scripts for contig stats2021-10-05T11:08:30+02:00Valentina Galatavalentina.galata@uni.luAssembly/Analysis: replace perl scripts for contig stats## Feature request
To simplify the code (and avoid hard-coded paths in the CMD), replace the Perl scripts used in [contig-length.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Assembly/common/contig-length.smk) to c...## Feature request
To simplify the code (and avoid hard-coded paths in the CMD), replace the Perl scripts used in [contig-length.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Assembly/common/contig-length.smk) to compute contig length and GC content by already existing utils.
For example, `seqkit` provides a utility to do that (see sub-command `fx2tab` [here](https://bioinf.shenwei.me/seqkit/usage/)) and this tool is already included in multiple `conda` environments.https://git-r3lab.uni.lu/IMP/imp3/-/issues/51Preprocessing: kneaddata for reads filtering2021-09-30T13:10:26+02:00Valentina Galatavalentina.galata@uni.luPreprocessing: kneaddata for reads filtering## Feature request
I would propose to consider to use `kneaddata` for reads filtering.
> This tool aims to perform principled in silico separation of bacterial reads from these "contaminant" reads, be they from the host, from bacterial...## Feature request
I would propose to consider to use `kneaddata` for reads filtering.
> This tool aims to perform principled in silico separation of bacterial reads from these "contaminant" reads, be they from the host, from bacterial 16S sequences, or other user-defined sources.
- can be installed via `conda`
- can use multiple references for filtering
- outputs reads mapped to each given reference in separate FASTQ files
- (runs `fastqc` for the input/output FASTQ files)
The rRNA filtering step could be included there as well or it could still be a separate rule.
With or without the rRNA filtering, this would reduce the code complexity considerably: there would be no need for those "chained" FASTQ files with multiple filtering-suffixes in their names.
The trimming step included in `kneaddata` can and has to be skipped because of the optional poly-G trimming which has to be done prior to filtering.
`kneaddata`:
- [web site](https://huttenhower.sph.harvard.edu/kneaddata/)
- [repo](https://github.com/biobakery/kneaddata)
- [tutorials](https://github.com/biobakery/biobakery/wiki/kneaddata#paired-end-reads)
- [forum](https://forum.biobakery.org/c/Infrastructure-and-utilities/KneadData/8)https://git-r3lab.uni.lu/IMP/imp3/-/issues/50Assembly stats: metaQUAST2021-09-27T08:04:14+02:00Valentina Galatavalentina.galata@uni.luAssembly stats: metaQUAST## Feature request
Should we include `metaQUAST` to get the basic statistics for the assembly/assemblies?
- [paper](https://academic.oup.com/bioinformatics/article/32/7/1088/1743987)
- [repo](https://github.com/ablab/quast)## Feature request
Should we include `metaQUAST` to get the basic statistics for the assembly/assemblies?
- [paper](https://academic.oup.com/bioinformatics/article/32/7/1088/1743987)
- [repo](https://github.com/ablab/quast)https://git-r3lab.uni.lu/IMP/imp3/-/issues/48Unexpected behavior: `ancient` keyword may trigger Snakemake scheduler issues2021-09-16T00:33:24+02:00Susheel BusiUnexpected behavior: `ancient` keyword may trigger Snakemake scheduler issues@valentina.galata and @anna.buschart
- When running metaG samples through `imp3` [commit ecead78], I noticed the following message/warning in the SLURM log file:
```
Failed to solve scheduling problem with ILP solver in time (10s). Fal...@valentina.galata and @anna.buschart
- When running metaG samples through `imp3` [commit ecead78], I noticed the following message/warning in the SLURM log file:
```
Failed to solve scheduling problem with ILP solver in time (10s). Falling back to greedy solver.
```
- While it doesn't cause any issue per se, I noticed that the jobs are taking longer to complete. I can't necessarily confirm this yet, and the sample is still running, but I've attached the [SLURM](/uploads/216b227eb3b2a5a7363fc1934cdeee29/slurm-2484978.out) log.
- A little digging revealed an issue with `snakemake` when the keyword `ancient` is used. See here: https://github.com/snakemake/snakemake/issues/946
- I noticed that `imp3` uses this keyword in the [function.definitions rule](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/function.definitions.smk)
Something to keep an eye on for the future ;)