IMP3 issueshttps://git-r3lab.uni.lu/IMP/imp3/-/issues2021-10-05T11:08:30+02:00https://git-r3lab.uni.lu/IMP/imp3/-/issues/52Assembly/Analysis: replace perl scripts for contig stats2021-10-05T11:08:30+02:00Valentina Galatavalentina.galata@uni.luAssembly/Analysis: replace perl scripts for contig stats## Feature request
To simplify the code (and avoid hard-coded paths in the CMD), replace the Perl scripts used in [contig-length.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Assembly/common/contig-length.smk) to c...## Feature request
To simplify the code (and avoid hard-coded paths in the CMD), replace the Perl scripts used in [contig-length.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Assembly/common/contig-length.smk) to compute contig length and GC content by already existing utils.
For example, `seqkit` provides a utility to do that (see sub-command `fx2tab` [here](https://bioinf.shenwei.me/seqkit/usage/)) and this tool is already included in multiple `conda` environments.https://git-r3lab.uni.lu/IMP/imp3/-/issues/11Assembly: merging/consensus: replace CAP32021-03-17T06:51:14+01:00Valentina Galatavalentina.galata@uni.luAssembly: merging/consensus: replace CAP3Replace `CAP3` in the the consensus assembly step:
- rule `collapse_hybrid_assemblies` in `rules/Assembly/hybrid/merge-hybrid-assembly.smk`
- rule `cap3` in in `rules/Assembly/common/merge-assembly.smk`Replace `CAP3` in the the consensus assembly step:
- rule `collapse_hybrid_assemblies` in `rules/Assembly/hybrid/merge-hybrid-assembly.smk`
- rule `cap3` in in `rules/Assembly/common/merge-assembly.smk`AssemblyValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/36Assembly: MG/MT assembly with MetaSPAdes2021-08-23T11:02:03+02:00Valentina Galatavalentina.galata@uni.luAssembly: MG/MT assembly with MetaSPAdesImplement MG/MT assembly using `metaspades`.
Currently, only `megahit` can be used for the MG/MT ("hybrid") assembly.Implement MG/MT assembly using `metaspades`.
Currently, only `megahit` can be used for the MG/MT ("hybrid") assembly.https://git-r3lab.uni.lu/IMP/imp3/-/issues/50Assembly stats: metaQUAST2021-09-27T08:04:14+02:00Valentina Galatavalentina.galata@uni.luAssembly stats: metaQUAST## Feature request
Should we include `metaQUAST` to get the basic statistics for the assembly/assemblies?
- [paper](https://academic.oup.com/bioinformatics/article/32/7/1088/1743987)
- [repo](https://github.com/ablab/quast)## Feature request
Should we include `metaQUAST` to get the basic statistics for the assembly/assemblies?
- [paper](https://academic.oup.com/bioinformatics/article/32/7/1088/1743987)
- [repo](https://github.com/ablab/quast)https://git-r3lab.uni.lu/IMP/imp3/-/issues/57Binning: Rework binning module2023-11-08T12:09:26+01:00Valentina Galatavalentina.galata@uni.luBinning: Rework binning module- Overhaul binning module
- Replace current `binny` implementation with its new version: https://github.com/a-h-b/binny- Overhaul binning module
- Replace current `binny` implementation with its new version: https://github.com/a-h-b/binnyAnalysis & Binning updatesOskar HicklOskar Hicklhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/24Bug: config tmp folder is ignored during rRNA filtering2021-10-08T11:12:27+02:00Valentina Galatavalentina.galata@uni.luBug: config tmp folder is ignored during rRNA filteringThe given tmp folder path (i.e. the one specified in the `snakemake` config file) seems to be ignored by `mktemp` in the rRNA filtering rule: https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Preprocessing/rna.filtering.smk#...The given tmp folder path (i.e. the one specified in the `snakemake` config file) seems to be ignored by `mktemp` in the rRNA filtering rule: https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Preprocessing/rna.filtering.smk#L10.
Though the CMD for `mktemp` looks correct during the execution, the path being used is `/tmp` which might cause issues for big FASTQ files.
As long as this issues is not resolved, the problem can be avoided by setting the shell variable `TMPDIR` when running `snakemake`:
```bash
TMPDIR="path/to/tmp/" snakemake ...
```OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/47Conda envs: smaller YAML files2022-02-04T14:55:27+01:00Valentina Galatavalentina.galata@uni.luConda envs: smaller YAML filesI would like to propose to have smaller `conda` env. YAML files instead of the big monolithic per-step files being currently used.
According to `snakemake`'s [documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/deploymen...I would like to propose to have smaller `conda` env. YAML files instead of the big monolithic per-step files being currently used.
According to `snakemake`'s [documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#distribution-and-reproducibility) the `conda` environments should be
> as finegrained as possible to improve transparency and maintainability.
In many cases, it makes sense to have per-rule or per-tool YAML files, e.g. for `MEGAHIT` or `metaSPAdes`.
This could be discussed and decided for each step and rule to avoid installation of the same tools in different environments.OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/41Configuration: configs, profiles and launchers2022-04-19T13:41:44+02:00Valentina Galatavalentina.galata@uni.luConfiguration: configs, profiles and launchersFollow the recommendations from `snakemake` and the [best practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) to update and restructure the current configuration setup:
- `snakemake` config
- move run...Follow the recommendations from `snakemake` and the [best practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) to update and restructure the current configuration setup:
- `snakemake` config
- move runtime specific configuration to `snakemake` profile
- `snakemake` profile
- to set default values for command line options
- `slurm` config
- (cluster configuration has been officially deprecated)
- incl. in `snakemake` profile
- ~~shell-variables config~~
- will be part of `snakemake` profile
- ~~launcher scripts~~
- all parameters should be given via `snakemake` profiles
### References
- [Snakemake: profiles](https://snakemake.readthedocs.io/en/stable/executing/cli.html?highlight=profile#profiles)
- [Template Snakemake profile for SLURM](https://github.com/Snakemake-Profiles/slurm)OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/35Config: use appropriate in-built data types2021-09-24T10:06:46+02:00Valentina Galatavalentina.galata@uni.luConfig: use appropriate in-built data types* [ ] use lists instead of strings if multiple values should/can be given for an attribute
- `steps`
- `input` files for MG/MT
- `summary_steps`
- `filtering`: `filter`
- `sortmerna`: `files`
- `hmm_DBs`
- `COGS`
- `binn...* [ ] use lists instead of strings if multiple values should/can be given for an attribute
- `steps`
- `input` files for MG/MT
- `summary_steps`
- `filtering`: `filter`
- `sortmerna`: `files`
- `hmm_DBs`
- `COGS`
- `binners`
* [ ] use a boolean instead of a string for `true/false` attributes
- e.g. `settingsLocked`
- edit: use `True` or `False` in the `snakemake` command when overwriting parameters with `--config`
* [ ] update/expand config validation schemaOverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/18Docs: Trimmomatic adapters2020-11-11T13:22:57+01:00Valentina Galatavalentina.galata@uni.luDocs: Trimmomatic adapters- setup: default and custom adapters
- how to:
- how to find out which adapters should be set
- trouble shooting:
- failed preprocessing step (few SE reads, no PE reads after filtering): choosing a wrong adapter- setup: default and custom adapters
- how to:
- how to find out which adapters should be set
- trouble shooting:
- failed preprocessing step (few SE reads, no PE reads after filtering): choosing a wrong adapterhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/56Error in rule merge_assembly_cap3 - segmentation fault2021-12-07T07:59:46+01:00javiercnavError in rule merge_assembly_cap3 - segmentation fault## Bug report
IMP3 stopped at the step of merge_assembly_cap3 without generating any content in the corresponding log file, but the screen message shown below:
Error in rule merge_assembly_cap3:
jobid: 9
output: Assembly/mg.ass...## Bug report
IMP3 stopped at the step of merge_assembly_cap3 without generating any content in the corresponding log file, but the screen message shown below:
Error in rule merge_assembly_cap3:
jobid: 9
output: Assembly/mg.assembly.merged.fa
log: logs/assembly_merge_assembly_cap3.log (check log file(s) for error message)
conda-env: /users/jcnavarro/conda/e7a8faf3f71c32a40cabb702d36f2415
shell:
NAME_fin=Assembly/mg.assembly
NAME=Assembly/intermediary/mg.assembly
cat Assembly/intermediary/mg.megahit_preprocessed.1.fa Assembly/intermediary/mg.megahit_unmapped.2.fa > $NAME.cat.fa
# Run cap3
cap3 $NAME.cat.fa -p 98 -o 100 > logs/assembly_merge_assembly_cap3.log 2>&1
# Concatenate assembled contigs, singletons and rename the contigs
cat $NAME.cat.fa.cap.contigs $NAME.cat.fa.cap.singlets | awk '/^>/{print ">T111_contig_" ++i; next}{print}' > $NAME_fin.merged.fa
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /mnt/san.ese/nfs/users/jcnavarro/.snakemake/log/2021-11-23T110427.419676.snakemake.log
### Log files and screenshots
see attachments
(List of logs/screenshots)
## Steps to reproduce
I run IMP3 with an smaller dataset and it run well.
**Code version**
```txt
commit hash
branch name
```
**Config files**
<!-- Attach used and created config files. -->
[terraces.yaml](/uploads/68a438d10a822355a43eee320154c814/terraces.yaml)
[2021-11-23T110427.419676.snakemake.log](/uploads/ac4bb93e8191ec4ab998e0065109e18e/2021-11-23T110427.419676.snakemake.log)
![Screen_shot_error](/uploads/91968b384185bdf63cce17228eebc1e6/Screen_shot_error.png)
![Screen_Shot_logs](/uploads/9054d4c57ab11de8f233f57b35bbbca6/Screen_Shot_logs.png)
[sample.config.yaml](/uploads/b82ef66a49d99e0208ecc24a0e0d8523/sample.config.yaml)
(List of config files)
**Command**
<!-- Attach used launcher script and/or provide the command below. -->
```bash
# command used to launch IMP3
snakemake -s /users/jcnavarro/IMP3/Snakefile --configfile /users/jcnavarro/playing.yaml --use-conda --conda-prefix /users/jcnavarro/conda/ --cores 20
```
**Input**
<!-- Provide relevant information about your input files. -->
R1/R2 fastq.gz for the metagenome of a soil sample. Each has a size of 20 Gb when compressed.
**System**
<!-- Short description of the system setup where you are running `IMP3`. -->
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
MemTotal: 528365004 kB
Threads/core: 64
## Possible fixes
<!-- If you can, link to the file or line of code that might be responsible for the problem - please make sure that the linked file corresponds to the indicated version of the code. -->
(Optional: How to fix)
--------------------------------------------------https://git-r3lab.uni.lu/IMP/imp3/-/issues/25Feature/Doc: SE (metaT) as input2021-08-23T14:18:05+02:00Valentina Galatavalentina.galata@uni.luFeature/Doc: SE (metaT) as inputNot sure if this is a ~~bug,~~ missing feature or missing documentation, but the pipeline fails when having only single-end (SE) reads as input.
Used config parameters for the raw data:
```yaml
raws:
Metatranscriptomics: some/path/Sam...Not sure if this is a ~~bug,~~ missing feature or missing documentation, but the pipeline fails when having only single-end (SE) reads as input.
Used config parameters for the raw data:
```yaml
raws:
Metatranscriptomics: some/path/Sample_R1_001.fastq.gz
```
Error message:
```
Building DAG of jobs...
MissingInputException in line 110 of /mnt/irisgpfs/users/vgalata/projects/sofunmoni/submodules/imp3/workflow/rules/Preprocessing/trimming.smk:
Missing input files for rule trimming:
Preprocessing/mt.r2.fq
Preprocessing/mt.r1.fq
```
IMP3 version: 47a4181Anna BuschartAnna Buscharthttps://git-r3lab.uni.lu/IMP/imp3/-/issues/43Formatting convention2021-10-08T13:45:35+02:00Valentina Galatavalentina.galata@uni.luFormatting convention### Formatting convention
Define a formatting convention and clean up the code according to the new guidelines.
The formatting convention is defined in the [Wiki](https://git-r3lab.uni.lu/IMP/imp3/-/wikis/Code-formatting-convention).
...### Formatting convention
Define a formatting convention and clean up the code according to the new guidelines.
The formatting convention is defined in the [Wiki](https://git-r3lab.uni.lu/IMP/imp3/-/wikis/Code-formatting-convention).
To define:
- rule structure template/guidelines
- Python:
- [PEP8](https://www.python.org/dev/peps/pep-0008/)
- 4 spaces instead of tabs
- double or single quotes?
- comments: reStructered Text or NumPy/SciPy?
- settings access
- either only global variables or `config`
- if global variables, then these should be defined in one place
To try:
- code quality checker (linter): `snakemake --lint`
- highlights issues to be solved to follow best practices
- highly recommended before publishing workflows
- automatic formatter: [Snakefmt](https://github.com/snakemake/snakefmt), based on [Black](https://black.readthedocs.io/en/stable/)
References:
- [Python docstring formats](https://realpython.com/documenting-python-code/#docstring-formats)
### Code "cleaning"
- changes w.r.t. formatting conventions mentioned above
- Python code
- `snakemake` rules
- (other scripts)
- add comments where possible
- one place for all helper functions
- no variables/functions inside rule files
- use `os.path` utils when working with paths
- replace code duplication with custom functions
- consistent indentation
- Q: can we simplify the structure in `workflow/rules/` ???OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/64IMP Test Runs2023-11-13T18:43:49+01:00Ricardo PariseIMP Test Runs- [ ] simple test running on all machines, example: bigmem
- [ ] set up tests for the different input modalities
- [ ] document this on the readthedocs- [ ] simple test running on all machines, example: bigmem
- [ ] set up tests for the different input modalities
- [ ] document this on the readthedocsRicardo PariseRicardo Parisehttps://git-r3lab.uni.lu/IMP/imp3/-/issues/31Input/output: work with compressed FASTQ files2021-08-18T15:03:22+02:00Valentina Galatavalentina.galata@uni.luInput/output: work with compressed FASTQ filesAll or most rules using reads work with decompressed FASTQ files.
Especially for deeply sequenced samples, this increases the runtime, and requires more space (even if only temporary for all but the final FASTQ files).
Nowadays, most to...All or most rules using reads work with decompressed FASTQ files.
Especially for deeply sequenced samples, this increases the runtime, and requires more space (even if only temporary for all but the final FASTQ files).
Nowadays, most tools included in `IMP3` (e.g. `Trimmomatic`, `cutadapt` etc.) should be able to use compressed files.
Decompression should be done only when necessary to minimize the overhead.https://git-r3lab.uni.lu/IMP/imp3/-/issues/55Job fails due to conda environment installation2021-12-08T10:36:25+01:00michougJob fails due to conda environment installation## Bug report
<!-- Describe the bug/error you have encountered. -->
Hi,
When launching IMP3 jobs using the iris cluster and the attached files, in some cases, this error happens :
[GL100_DN.launchIMP.sh](/uploads/cb56a27e3e15b0958c4d9...## Bug report
<!-- Describe the bug/error you have encountered. -->
Hi,
When launching IMP3 jobs using the iris cluster and the attached files, in some cases, this error happens :
[GL100_DN.launchIMP.sh](/uploads/cb56a27e3e15b0958c4d9f2f46800a53/GL100_DN.launchIMP.sh)
[GL100_DN.runIMP.sh](/uploads/978beb0ffa116de85ab40228362ec25d/GL100_DN.runIMP.sh)
[GL100_DN_config.yaml](/uploads/eff8270c0d05fa76b89dbf6245123e74/GL100_DN_config.yaml)
```
/home/users/gmichoud/IMP3/.
Building DAG of jobs...
Creating conda environment /home/users/gmichoud/IMP3/workflow/rules/ini/../../envs/IMP_binning.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/users/gmichoud/IMP3/workflow/rules/ini/../../envs/IMP_binning.yaml:
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working...
####################################################################################
das_tool version 1.1.1-2 has been successfully installed!
This software by default runs with USEARCH. You can install it from the following links or use DIAMOND '--search_engine diamond'
> Download: http://www.drive5.com/usearch/download.html
> Installation instruction: http://www.drive5.com/usearch/manual/install.html
done
ERROR conda.core.link:_execute(699): An error occurred while installing package 'conda-forge::fribidi-1.0.5-h516909a_1002'.
Rolling back transaction: ...working... done
CondaError: Cannot link a source that does not exist. /home/users/gmichoud/IMP3/conda/4788f34499eae03fa997bfd1ba6c307c/.condatmp/e7595ab0-dd9c-485f-beb1-48de0bac8174
Running `conda clean --packages` may resolve your problem.
()
```
the python version I use is Python 3.9.1, installed via Miniconda.
The git branch is the latest updated (as of yesterday)
Any ideas?
Best
Greghttps://git-r3lab.uni.lu/IMP/imp3/-/issues/63Mantis GFF script: decrease runtime for big samples2022-09-07T17:15:42+02:00Valentina Galatavalentina.galata@uni.luMantis GFF script: decrease runtime for big samplesSpeed up the code in [mantis_gff.py](https://gitlab.lcsb.uni.lu/IMP/imp3/-/blob/issue59/workflow/scripts/Analysis/mantis_gff.py) for big samples with many annotations.Speed up the code in [mantis_gff.py](https://gitlab.lcsb.uni.lu/IMP/imp3/-/blob/issue59/workflow/scripts/Analysis/mantis_gff.py) for big samples with many annotations.Analysis & Binning updatesValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/15MANTIS: setup and rules2022-03-02T16:08:45+01:00Valentina Galatavalentina.galata@uni.luMANTIS: setup and rules[MANTIS](https://github.com/PedroMTQ/mantis) setup
branch: https://git-r3lab.uni.lu/IMP/imp3/-/tree/mantis
* [x] `git submodule`
* [x] `conda` env.
* [x] check installation
* [x] add rules, config param.s etc.
* [x] test run (incl. bin...[MANTIS](https://github.com/PedroMTQ/mantis) setup
branch: https://git-r3lab.uni.lu/IMP/imp3/-/tree/mantis
* [x] `git submodule`
* [x] `conda` env.
* [x] check installation
* [x] add rules, config param.s etc.
* [x] test run (incl. binning)
* [x] update the [mantis branch](https://git-r3lab.uni.lu/IMP/imp3/-/tree/mantis)
* [x] switch from GitLab repo to GitHub repo
* [x] update the code (see issue #22)
* [x] databases download/setup (see issue #22)
* [x] replace `gut submodule` by `conda` installation (i.e. env. YAML file)
* [ ] replace HMM search w/ `essential` HMM by `mantis` annotation ([list of genes in Mantis](https://github.com/PedroMTQ/mantis/tree/master/Resources/essential_genes)) --> updated version: `1.3`
- *pending*: an update of `binny` might make this task obsoleteAnalysis & Binning updatespedro queirospedro.queiros@uni.luValentina Galatavalentina.galata@uni.lupedro queirospedro.queiros@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/38nonpareil: simplify the rule2022-02-10T07:35:22+01:00Valentina Galatavalentina.galata@uni.lunonpareil: simplify the ruleSimplify the rule `nonpareil` in [nonpareil.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Preprocessing/nonpareil.smk) and improve the formatting if possible.
For future reference:
- `nonpareil` does not take gzipp...Simplify the rule `nonpareil` in [nonpareil.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Preprocessing/nonpareil.smk) and improve the formatting if possible.
For future reference:
- `nonpareil` does not take gzipped FASTQ files: see [issue 35](https://github.com/lmrodriguezr/nonpareil/issues/35) --> might get fixed in new version
- `nonpareil` runs are not completely reproducible: see [issue 48](https://github.com/lmrodriguezr/nonpareil/issues/48) --> might get fixed in new versionOverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/32Output: minimize output size2021-08-27T10:55:31+02:00Valentina Galatavalentina.galata@uni.luOutput: minimize output sizeThe current output of `IMP3` is huge, especially the folder `Preprocessing/`.
For a user, it might be difficult to decide which files could be deleted because they are not required later and could be re-created if necessary.
For example,...The current output of `IMP3` is huge, especially the folder `Preprocessing/`.
For a user, it might be difficult to decide which files could be deleted because they are not required later and could be re-created if necessary.
For example, `<omic>.se1.trimmed.fq.gz` and `<omic>.se2.trimmed.fq.gz` are concatenated into `<omic>.se.trimmed.fq.gz` but all three are kept after preprocessing.
I would suggest to have a discussion how to best address this issue:
- identify files which could be considered as not or less relevant, and can be re-created if needed
- define how to handle these files, e.g. removing these if a certain flag is set in the config file or provide a list of files so the users can remove them themselves