IMP3 issueshttps://git-r3lab.uni.lu/IMP/imp3/-/issues2023-11-13T18:43:49+01:00https://git-r3lab.uni.lu/IMP/imp3/-/issues/64IMP Test Runs2023-11-13T18:43:49+01:00Ricardo PariseIMP Test Runs- [ ] simple test running on all machines, example: bigmem
- [ ] set up tests for the different input modalities
- [ ] document this on the readthedocs- [ ] simple test running on all machines, example: bigmem
- [ ] set up tests for the different input modalities
- [ ] document this on the readthedocsRicardo PariseRicardo Parisehttps://git-r3lab.uni.lu/IMP/imp3/-/issues/63Mantis GFF script: decrease runtime for big samples2022-09-07T17:15:42+02:00Valentina Galatavalentina.galata@uni.luMantis GFF script: decrease runtime for big samplesSpeed up the code in [mantis_gff.py](https://gitlab.lcsb.uni.lu/IMP/imp3/-/blob/issue59/workflow/scripts/Analysis/mantis_gff.py) for big samples with many annotations.Speed up the code in [mantis_gff.py](https://gitlab.lcsb.uni.lu/IMP/imp3/-/blob/issue59/workflow/scripts/Analysis/mantis_gff.py) for big samples with many annotations.Analysis & Binning updatesValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/57Binning: Rework binning module2023-11-08T12:09:26+01:00Valentina Galatavalentina.galata@uni.luBinning: Rework binning module- Overhaul binning module
- Replace current `binny` implementation with its new version: https://github.com/a-h-b/binny- Overhaul binning module
- Replace current `binny` implementation with its new version: https://github.com/a-h-b/binnyAnalysis & Binning updatesOskar HicklOskar Hicklhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/56Error in rule merge_assembly_cap3 - segmentation fault2021-12-07T07:59:46+01:00javiercnavError in rule merge_assembly_cap3 - segmentation fault## Bug report
IMP3 stopped at the step of merge_assembly_cap3 without generating any content in the corresponding log file, but the screen message shown below:
Error in rule merge_assembly_cap3:
jobid: 9
output: Assembly/mg.ass...## Bug report
IMP3 stopped at the step of merge_assembly_cap3 without generating any content in the corresponding log file, but the screen message shown below:
Error in rule merge_assembly_cap3:
jobid: 9
output: Assembly/mg.assembly.merged.fa
log: logs/assembly_merge_assembly_cap3.log (check log file(s) for error message)
conda-env: /users/jcnavarro/conda/e7a8faf3f71c32a40cabb702d36f2415
shell:
NAME_fin=Assembly/mg.assembly
NAME=Assembly/intermediary/mg.assembly
cat Assembly/intermediary/mg.megahit_preprocessed.1.fa Assembly/intermediary/mg.megahit_unmapped.2.fa > $NAME.cat.fa
# Run cap3
cap3 $NAME.cat.fa -p 98 -o 100 > logs/assembly_merge_assembly_cap3.log 2>&1
# Concatenate assembled contigs, singletons and rename the contigs
cat $NAME.cat.fa.cap.contigs $NAME.cat.fa.cap.singlets | awk '/^>/{print ">T111_contig_" ++i; next}{print}' > $NAME_fin.merged.fa
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /mnt/san.ese/nfs/users/jcnavarro/.snakemake/log/2021-11-23T110427.419676.snakemake.log
### Log files and screenshots
see attachments
(List of logs/screenshots)
## Steps to reproduce
I run IMP3 with an smaller dataset and it run well.
**Code version**
```txt
commit hash
branch name
```
**Config files**
<!-- Attach used and created config files. -->
[terraces.yaml](/uploads/68a438d10a822355a43eee320154c814/terraces.yaml)
[2021-11-23T110427.419676.snakemake.log](/uploads/ac4bb93e8191ec4ab998e0065109e18e/2021-11-23T110427.419676.snakemake.log)
![Screen_shot_error](/uploads/91968b384185bdf63cce17228eebc1e6/Screen_shot_error.png)
![Screen_Shot_logs](/uploads/9054d4c57ab11de8f233f57b35bbbca6/Screen_Shot_logs.png)
[sample.config.yaml](/uploads/b82ef66a49d99e0208ecc24a0e0d8523/sample.config.yaml)
(List of config files)
**Command**
<!-- Attach used launcher script and/or provide the command below. -->
```bash
# command used to launch IMP3
snakemake -s /users/jcnavarro/IMP3/Snakefile --configfile /users/jcnavarro/playing.yaml --use-conda --conda-prefix /users/jcnavarro/conda/ --cores 20
```
**Input**
<!-- Provide relevant information about your input files. -->
R1/R2 fastq.gz for the metagenome of a soil sample. Each has a size of 20 Gb when compressed.
**System**
<!-- Short description of the system setup where you are running `IMP3`. -->
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
MemTotal: 528365004 kB
Threads/core: 64
## Possible fixes
<!-- If you can, link to the file or line of code that might be responsible for the problem - please make sure that the linked file corresponds to the indicated version of the code. -->
(Optional: How to fix)
--------------------------------------------------https://git-r3lab.uni.lu/IMP/imp3/-/issues/55Job fails due to conda environment installation2021-12-08T10:36:25+01:00michougJob fails due to conda environment installation## Bug report
<!-- Describe the bug/error you have encountered. -->
Hi,
When launching IMP3 jobs using the iris cluster and the attached files, in some cases, this error happens :
[GL100_DN.launchIMP.sh](/uploads/cb56a27e3e15b0958c4d9...## Bug report
<!-- Describe the bug/error you have encountered. -->
Hi,
When launching IMP3 jobs using the iris cluster and the attached files, in some cases, this error happens :
[GL100_DN.launchIMP.sh](/uploads/cb56a27e3e15b0958c4d9f2f46800a53/GL100_DN.launchIMP.sh)
[GL100_DN.runIMP.sh](/uploads/978beb0ffa116de85ab40228362ec25d/GL100_DN.runIMP.sh)
[GL100_DN_config.yaml](/uploads/eff8270c0d05fa76b89dbf6245123e74/GL100_DN_config.yaml)
```
/home/users/gmichoud/IMP3/.
Building DAG of jobs...
Creating conda environment /home/users/gmichoud/IMP3/workflow/rules/ini/../../envs/IMP_binning.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/users/gmichoud/IMP3/workflow/rules/ini/../../envs/IMP_binning.yaml:
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working...
####################################################################################
das_tool version 1.1.1-2 has been successfully installed!
This software by default runs with USEARCH. You can install it from the following links or use DIAMOND '--search_engine diamond'
> Download: http://www.drive5.com/usearch/download.html
> Installation instruction: http://www.drive5.com/usearch/manual/install.html
done
ERROR conda.core.link:_execute(699): An error occurred while installing package 'conda-forge::fribidi-1.0.5-h516909a_1002'.
Rolling back transaction: ...working... done
CondaError: Cannot link a source that does not exist. /home/users/gmichoud/IMP3/conda/4788f34499eae03fa997bfd1ba6c307c/.condatmp/e7595ab0-dd9c-485f-beb1-48de0bac8174
Running `conda clean --packages` may resolve your problem.
()
```
the python version I use is Python 3.9.1, installed via Miniconda.
The git branch is the latest updated (as of yesterday)
Any ideas?
Best
Greghttps://git-r3lab.uni.lu/IMP/imp3/-/issues/53Refactoring2021-11-15T08:33:46+01:00pedro queirospedro.queiros@uni.luRefactoringOverhaulpedro queirospedro.queiros@uni.lupedro queirospedro.queiros@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/52Assembly/Analysis: replace perl scripts for contig stats2021-10-05T11:08:30+02:00Valentina Galatavalentina.galata@uni.luAssembly/Analysis: replace perl scripts for contig stats## Feature request
To simplify the code (and avoid hard-coded paths in the CMD), replace the Perl scripts used in [contig-length.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Assembly/common/contig-length.smk) to c...## Feature request
To simplify the code (and avoid hard-coded paths in the CMD), replace the Perl scripts used in [contig-length.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Assembly/common/contig-length.smk) to compute contig length and GC content by already existing utils.
For example, `seqkit` provides a utility to do that (see sub-command `fx2tab` [here](https://bioinf.shenwei.me/seqkit/usage/)) and this tool is already included in multiple `conda` environments.https://git-r3lab.uni.lu/IMP/imp3/-/issues/51Preprocessing: kneaddata for reads filtering2021-09-30T13:10:26+02:00Valentina Galatavalentina.galata@uni.luPreprocessing: kneaddata for reads filtering## Feature request
I would propose to consider to use `kneaddata` for reads filtering.
> This tool aims to perform principled in silico separation of bacterial reads from these "contaminant" reads, be they from the host, from bacterial...## Feature request
I would propose to consider to use `kneaddata` for reads filtering.
> This tool aims to perform principled in silico separation of bacterial reads from these "contaminant" reads, be they from the host, from bacterial 16S sequences, or other user-defined sources.
- can be installed via `conda`
- can use multiple references for filtering
- outputs reads mapped to each given reference in separate FASTQ files
- (runs `fastqc` for the input/output FASTQ files)
The rRNA filtering step could be included there as well or it could still be a separate rule.
With or without the rRNA filtering, this would reduce the code complexity considerably: there would be no need for those "chained" FASTQ files with multiple filtering-suffixes in their names.
The trimming step included in `kneaddata` can and has to be skipped because of the optional poly-G trimming which has to be done prior to filtering.
`kneaddata`:
- [web site](https://huttenhower.sph.harvard.edu/kneaddata/)
- [repo](https://github.com/biobakery/kneaddata)
- [tutorials](https://github.com/biobakery/biobakery/wiki/kneaddata#paired-end-reads)
- [forum](https://forum.biobakery.org/c/Infrastructure-and-utilities/KneadData/8)https://git-r3lab.uni.lu/IMP/imp3/-/issues/50Assembly stats: metaQUAST2021-09-27T08:04:14+02:00Valentina Galatavalentina.galata@uni.luAssembly stats: metaQUAST## Feature request
Should we include `metaQUAST` to get the basic statistics for the assembly/assemblies?
- [paper](https://academic.oup.com/bioinformatics/article/32/7/1088/1743987)
- [repo](https://github.com/ablab/quast)## Feature request
Should we include `metaQUAST` to get the basic statistics for the assembly/assemblies?
- [paper](https://academic.oup.com/bioinformatics/article/32/7/1088/1743987)
- [repo](https://github.com/ablab/quast)https://git-r3lab.uni.lu/IMP/imp3/-/issues/48Unexpected behavior: `ancient` keyword may trigger Snakemake scheduler issues2021-09-16T00:33:24+02:00Susheel BusiUnexpected behavior: `ancient` keyword may trigger Snakemake scheduler issues@valentina.galata and @anna.buschart
- When running metaG samples through `imp3` [commit ecead78], I noticed the following message/warning in the SLURM log file:
```
Failed to solve scheduling problem with ILP solver in time (10s). Fal...@valentina.galata and @anna.buschart
- When running metaG samples through `imp3` [commit ecead78], I noticed the following message/warning in the SLURM log file:
```
Failed to solve scheduling problem with ILP solver in time (10s). Falling back to greedy solver.
```
- While it doesn't cause any issue per se, I noticed that the jobs are taking longer to complete. I can't necessarily confirm this yet, and the sample is still running, but I've attached the [SLURM](/uploads/216b227eb3b2a5a7363fc1934cdeee29/slurm-2484978.out) log.
- A little digging revealed an issue with `snakemake` when the keyword `ancient` is used. See here: https://github.com/snakemake/snakemake/issues/946
- I noticed that `imp3` uses this keyword in the [function.definitions rule](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/function.definitions.smk)
Something to keep an eye on for the future ;)https://git-r3lab.uni.lu/IMP/imp3/-/issues/47Conda envs: smaller YAML files2022-02-04T14:55:27+01:00Valentina Galatavalentina.galata@uni.luConda envs: smaller YAML filesI would like to propose to have smaller `conda` env. YAML files instead of the big monolithic per-step files being currently used.
According to `snakemake`'s [documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/deploymen...I would like to propose to have smaller `conda` env. YAML files instead of the big monolithic per-step files being currently used.
According to `snakemake`'s [documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#distribution-and-reproducibility) the `conda` environments should be
> as finegrained as possible to improve transparency and maintainability.
In many cases, it makes sense to have per-rule or per-tool YAML files, e.g. for `MEGAHIT` or `metaSPAdes`.
This could be discussed and decided for each step and rule to avoid installation of the same tools in different environments.OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/46Transcriptome assembly with rnaSPAdes2021-09-13T09:31:47+02:00Valentina Galatavalentina.galata@uni.luTranscriptome assembly with rnaSPAdesCurrent rule for `rnaSPAdes` includes multiple k-mers and uses `--meta`: https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Assembly/single-omic/mt/metaspades.smk.
From [rnaSPAdes manual v3.21.0](https://cab.spbu.ru/files/re...Current rule for `rnaSPAdes` includes multiple k-mers and uses `--meta`: https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Assembly/single-omic/mt/metaspades.smk.
From [rnaSPAdes manual v3.21.0](https://cab.spbu.ru/files/release3.12.0/rnaspades_manual.html)
> rnaSPAdes works using only a single k-mer size (automatically detected using read length by the default).
> We strongly recommend not to change this parameter.
>
> rnaSPAdes is not compatible with other pipeline options such as --meta, --sc and --plasmid. If you wish to assemble metatranscriptomic data just run rnaSPAdes as it is.https://git-r3lab.uni.lu/IMP/imp3/-/issues/45Unit testing, CI and merging2021-10-26T10:11:40+02:00Valentina Galatavalentina.galata@uni.luUnit testing, CI and mergingConsider to include
- [Snakemake unit tests](https://snakemake.readthedocs.io/en/stable/snakefiles/testing.html#snakefiles-testing) (at least for some steps/rules)
- Continuous Integration (which is currently not working)
- [Fast Forward...Consider to include
- [Snakemake unit tests](https://snakemake.readthedocs.io/en/stable/snakefiles/testing.html#snakefiles-testing) (at least for some steps/rules)
- Continuous Integration (which is currently not working)
- [Fast Forward Merge](https://docs.gitlab.com/ee/user/project/merge_requests/fast_forward_merge.html)
The current test data set might be too big for this purpose.
We might need another small and simple dummy test dataset.
*TODOs: TBD*OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/43Formatting convention2021-10-08T13:45:35+02:00Valentina Galatavalentina.galata@uni.luFormatting convention### Formatting convention
Define a formatting convention and clean up the code according to the new guidelines.
The formatting convention is defined in the [Wiki](https://git-r3lab.uni.lu/IMP/imp3/-/wikis/Code-formatting-convention).
...### Formatting convention
Define a formatting convention and clean up the code according to the new guidelines.
The formatting convention is defined in the [Wiki](https://git-r3lab.uni.lu/IMP/imp3/-/wikis/Code-formatting-convention).
To define:
- rule structure template/guidelines
- Python:
- [PEP8](https://www.python.org/dev/peps/pep-0008/)
- 4 spaces instead of tabs
- double or single quotes?
- comments: reStructered Text or NumPy/SciPy?
- settings access
- either only global variables or `config`
- if global variables, then these should be defined in one place
To try:
- code quality checker (linter): `snakemake --lint`
- highlights issues to be solved to follow best practices
- highly recommended before publishing workflows
- automatic formatter: [Snakefmt](https://github.com/snakemake/snakefmt), based on [Black](https://black.readthedocs.io/en/stable/)
References:
- [Python docstring formats](https://realpython.com/documenting-python-code/#docstring-formats)
### Code "cleaning"
- changes w.r.t. formatting conventions mentioned above
- Python code
- `snakemake` rules
- (other scripts)
- add comments where possible
- one place for all helper functions
- no variables/functions inside rule files
- use `os.path` utils when working with paths
- replace code duplication with custom functions
- consistent indentation
- Q: can we simplify the structure in `workflow/rules/` ???OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/41Configuration: configs, profiles and launchers2022-04-19T13:41:44+02:00Valentina Galatavalentina.galata@uni.luConfiguration: configs, profiles and launchersFollow the recommendations from `snakemake` and the [best practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) to update and restructure the current configuration setup:
- `snakemake` config
- move run...Follow the recommendations from `snakemake` and the [best practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) to update and restructure the current configuration setup:
- `snakemake` config
- move runtime specific configuration to `snakemake` profile
- `snakemake` profile
- to set default values for command line options
- `slurm` config
- (cluster configuration has been officially deprecated)
- incl. in `snakemake` profile
- ~~shell-variables config~~
- will be part of `snakemake` profile
- ~~launcher scripts~~
- all parameters should be given via `snakemake` profiles
### References
- [Snakemake: profiles](https://snakemake.readthedocs.io/en/stable/executing/cli.html?highlight=profile#profiles)
- [Template Snakemake profile for SLURM](https://github.com/Snakemake-Profiles/slurm)OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/38nonpareil: simplify the rule2022-02-10T07:35:22+01:00Valentina Galatavalentina.galata@uni.lunonpareil: simplify the ruleSimplify the rule `nonpareil` in [nonpareil.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Preprocessing/nonpareil.smk) and improve the formatting if possible.
For future reference:
- `nonpareil` does not take gzipp...Simplify the rule `nonpareil` in [nonpareil.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Preprocessing/nonpareil.smk) and improve the formatting if possible.
For future reference:
- `nonpareil` does not take gzipped FASTQ files: see [issue 35](https://github.com/lmrodriguezr/nonpareil/issues/35) --> might get fixed in new version
- `nonpareil` runs are not completely reproducible: see [issue 48](https://github.com/lmrodriguezr/nonpareil/issues/48) --> might get fixed in new versionOverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/36Assembly: MG/MT assembly with MetaSPAdes2021-08-23T11:02:03+02:00Valentina Galatavalentina.galata@uni.luAssembly: MG/MT assembly with MetaSPAdesImplement MG/MT assembly using `metaspades`.
Currently, only `megahit` can be used for the MG/MT ("hybrid") assembly.Implement MG/MT assembly using `metaspades`.
Currently, only `megahit` can be used for the MG/MT ("hybrid") assembly.https://git-r3lab.uni.lu/IMP/imp3/-/issues/35Config: use appropriate in-built data types2021-09-24T10:06:46+02:00Valentina Galatavalentina.galata@uni.luConfig: use appropriate in-built data types* [ ] use lists instead of strings if multiple values should/can be given for an attribute
- `steps`
- `input` files for MG/MT
- `summary_steps`
- `filtering`: `filter`
- `sortmerna`: `files`
- `hmm_DBs`
- `COGS`
- `binn...* [ ] use lists instead of strings if multiple values should/can be given for an attribute
- `steps`
- `input` files for MG/MT
- `summary_steps`
- `filtering`: `filter`
- `sortmerna`: `files`
- `hmm_DBs`
- `COGS`
- `binners`
* [ ] use a boolean instead of a string for `true/false` attributes
- e.g. `settingsLocked`
- edit: use `True` or `False` in the `snakemake` command when overwriting parameters with `--config`
* [ ] update/expand config validation schemaOverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/32Output: minimize output size2021-08-27T10:55:31+02:00Valentina Galatavalentina.galata@uni.luOutput: minimize output sizeThe current output of `IMP3` is huge, especially the folder `Preprocessing/`.
For a user, it might be difficult to decide which files could be deleted because they are not required later and could be re-created if necessary.
For example,...The current output of `IMP3` is huge, especially the folder `Preprocessing/`.
For a user, it might be difficult to decide which files could be deleted because they are not required later and could be re-created if necessary.
For example, `<omic>.se1.trimmed.fq.gz` and `<omic>.se2.trimmed.fq.gz` are concatenated into `<omic>.se.trimmed.fq.gz` but all three are kept after preprocessing.
I would suggest to have a discussion how to best address this issue:
- identify files which could be considered as not or less relevant, and can be re-created if needed
- define how to handle these files, e.g. removing these if a certain flag is set in the config file or provide a list of files so the users can remove them themselveshttps://git-r3lab.uni.lu/IMP/imp3/-/issues/31Input/output: work with compressed FASTQ files2021-08-18T15:03:22+02:00Valentina Galatavalentina.galata@uni.luInput/output: work with compressed FASTQ filesAll or most rules using reads work with decompressed FASTQ files.
Especially for deeply sequenced samples, this increases the runtime, and requires more space (even if only temporary for all but the final FASTQ files).
Nowadays, most to...All or most rules using reads work with decompressed FASTQ files.
Especially for deeply sequenced samples, this increases the runtime, and requires more space (even if only temporary for all but the final FASTQ files).
Nowadays, most tools included in `IMP3` (e.g. `Trimmomatic`, `cutadapt` etc.) should be able to use compressed files.
Decompression should be done only when necessary to minimize the overhead.