IMP issueshttps://git-r3lab.uni.lu/groups/IMP/-/issues2023-11-13T18:43:49+01:00https://git-r3lab.uni.lu/IMP/imp3/-/issues/64IMP Test Runs2023-11-13T18:43:49+01:00Ricardo PariseIMP Test Runs- [ ] simple test running on all machines, example: bigmem
- [ ] set up tests for the different input modalities
- [ ] document this on the readthedocs- [ ] simple test running on all machines, example: bigmem
- [ ] set up tests for the different input modalities
- [ ] document this on the readthedocsRicardo PariseRicardo Parisehttps://git-r3lab.uni.lu/IMP/imp3/-/issues/63Mantis GFF script: decrease runtime for big samples2022-09-07T17:15:42+02:00Valentina Galatavalentina.galata@uni.luMantis GFF script: decrease runtime for big samplesSpeed up the code in [mantis_gff.py](https://gitlab.lcsb.uni.lu/IMP/imp3/-/blob/issue59/workflow/scripts/Analysis/mantis_gff.py) for big samples with many annotations.Speed up the code in [mantis_gff.py](https://gitlab.lcsb.uni.lu/IMP/imp3/-/blob/issue59/workflow/scripts/Analysis/mantis_gff.py) for big samples with many annotations.Analysis & Binning updatesValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/166Error in hybrid assembly occurred2022-04-30T04:31:41+02:00MikaError in hybrid assembly occurred
Hi, When I run the assembly step, the following error in hybrid assembly occurred.
How can I fix this? Thank you.
```
$impy assembly -m ${MG1} -m ${MG2} -m ${MG_single} -t ${MT1} -t ${MT2} -t ${MT_single} -o ${Dir_out}
.
.
.
(omitted...
Hi, When I run the assembly step, the following error in hybrid assembly occurred.
How can I fix this? Thank you.
```
$impy assembly -m ${MG1} -m ${MG2} -m ${MG_single} -t ${MT1} -t ${MT2} -t ${MT_single} -o ${Dir_out}
.
.
.
(omitted)
.
.
.
rule megahit_assembly_from_unmapped:
input: Assembly/mt.r1.unmapped.fq, Assembly/mt.r2.unmapped.fq, Assembly/mt.se.unmapped.fq
output: Assembly/mt.megahit_unmapped.2/final.contigs.fa, Assembly/mt.megahit_unmapped.2.fa
wildcards: loop=2, type=mt
[x] Performing mt assembly step '2' using MEGAHIT
MEGAHIT v1.0.6
--- [Tue Apr 19 16:27:43 2022] Start assembly. Number of CPU threads 4 ---
--- [Tue Apr 19 16:27:43 2022] Available memory: 33675894784, used: 24000000000
--- [Tue Apr 19 16:27:43 2022] k list: 25,29,33,37,41,45,49,53,57,61,65,69,73,77,81,85,89,93,97,99 ---
--- [Tue Apr 19 16:27:43 2022] Converting reads to binaries ---
[read_lib_functions-inl.h : 209] Lib 0 (Assembly/mt.r1.unmapped.fq,Assembly/mt.r2.unmapped.fq): pe, 159143522 reads, 100 max length
[read_lib_functions-inl.h : 209] Lib 1 (Assembly/mt.se.unmapped.fq): se, 0 reads, 0 max length
[utils.h : 126] Real: 747.1210 user: 148.4221 sys: 71.0971 maxrss: 155120
--- [Tue Apr 19 16:40:10 2022] Extracting solid (k+1)-mers for k = 25 ---
--- [Tue Apr 19 16:54:03 2022] Building graph for k = 25 ---
--- [Tue Apr 19 16:54:08 2022] Assembling contigs from SdBG for k = 25 ---
--- [Tue Apr 19 16:54:22 2022] Local assembling k = 25 ---
--- [Tue Apr 19 16:56:46 2022] Extracting iterative edges from k = 25 to 29 ---
--- [Tue Apr 19 17:09:31 2022] Building graph for k = 29 ---
--- [Tue Apr 19 17:09:34 2022] Assembling contigs from SdBG for k = 29 ---
--- [Tue Apr 19 17:09:48 2022] Local assembling k = 29 ---
--- [Tue Apr 19 17:12:15 2022] Extracting iterative edges from k = 29 to 33 ---
--- [Tue Apr 19 17:25:23 2022] Building graph for k = 33 ---
--- [Tue Apr 19 17:25:27 2022] Assembling contigs from SdBG for k = 33 ---
--- [Tue Apr 19 17:25:43 2022] Local assembling k = 33 ---
--- [Tue Apr 19 17:28:13 2022] Extracting iterative edges from k = 33 to 37 ---
--- [Tue Apr 19 17:41:24 2022] Building graph for k = 37 ---
--- [Tue Apr 19 17:41:27 2022] Assembling contigs from SdBG for k = 37 ---
--- [Tue Apr 19 17:41:45 2022] Local assembling k = 37 ---
--- [Tue Apr 19 17:43:45 2022] Extracting iterative edges from k = 37 to 41 ---
--- [Tue Apr 19 17:55:27 2022] Building graph for k = 41 ---
--- [Tue Apr 19 17:55:31 2022] Assembling contigs from SdBG for k = 41 ---
--- [Tue Apr 19 17:55:48 2022] Local assembling k = 41 ---
--- [Tue Apr 19 17:57:48 2022] Extracting iterative edges from k = 41 to 45 ---
--- [Tue Apr 19 18:08:51 2022] Building graph for k = 45 ---
--- [Tue Apr 19 18:08:55 2022] Assembling contigs from SdBG for k = 45 ---
--- [Tue Apr 19 18:09:12 2022] Local assembling k = 45 ---
--- [Tue Apr 19 18:11:13 2022] Extracting iterative edges from k = 45 to 49 ---
--- [Tue Apr 19 18:21:23 2022] Building graph for k = 49 ---
--- [Tue Apr 19 18:21:27 2022] Assembling contigs from SdBG for k = 49 ---
--- [Tue Apr 19 18:21:44 2022] Local assembling k = 49 ---
--- [Tue Apr 19 18:23:45 2022] Extracting iterative edges from k = 49 to 53 ---
--- [Tue Apr 19 18:34:01 2022] Building graph for k = 53 ---
--- [Tue Apr 19 18:34:05 2022] Assembling contigs from SdBG for k = 53 ---
--- [Tue Apr 19 18:34:20 2022] Local assembling k = 53 ---
--- [Tue Apr 19 18:36:22 2022] Extracting iterative edges from k = 53 to 57 ---
--- [Tue Apr 19 18:45:32 2022] Building graph for k = 57 ---
--- [Tue Apr 19 18:45:35 2022] Assembling contigs from SdBG for k = 57 ---
--- [Tue Apr 19 18:45:48 2022] Local assembling k = 57 ---
--- [Tue Apr 19 18:47:52 2022] Extracting iterative edges from k = 57 to 61 ---
--- [Tue Apr 19 18:55:52 2022] Building graph for k = 61 ---
--- [Tue Apr 19 18:55:55 2022] Assembling contigs from SdBG for k = 61 ---
--- [Tue Apr 19 18:56:06 2022] Local assembling k = 61 ---
--- [Tue Apr 19 18:58:09 2022] Extracting iterative edges from k = 61 to 65 ---
--- [Tue Apr 19 19:05:34 2022] Building graph for k = 65 ---
--- [Tue Apr 19 19:05:36 2022] Assembling contigs from SdBG for k = 65 ---
--- [Tue Apr 19 19:05:45 2022] Local assembling k = 65 ---
--- [Tue Apr 19 19:07:48 2022] Extracting iterative edges from k = 65 to 69 ---
--- [Tue Apr 19 19:15:06 2022] Building graph for k = 69 ---
--- [Tue Apr 19 19:15:08 2022] Assembling contigs from SdBG for k = 69 ---
--- [Tue Apr 19 19:15:17 2022] Local assembling k = 69 ---
--- [Tue Apr 19 19:17:18 2022] Extracting iterative edges from k = 69 to 73 ---
--- [Tue Apr 19 19:23:38 2022] Building graph for k = 73 ---
--- [Tue Apr 19 19:23:40 2022] Assembling contigs from SdBG for k = 73 ---
--- [Tue Apr 19 19:23:46 2022] Local assembling k = 73 ---
--- [Tue Apr 19 19:26:08 2022] Extracting iterative edges from k = 73 to 77 ---
--- [Tue Apr 19 19:31:26 2022] Building graph for k = 77 ---
--- [Tue Apr 19 19:31:28 2022] Assembling contigs from SdBG for k = 77 ---
--- [Tue Apr 19 19:31:32 2022] Local assembling k = 77 ---
--- [Tue Apr 19 19:33:29 2022] Extracting iterative edges from k = 77 to 81 ---
--- [Tue Apr 19 19:38:04 2022] Building graph for k = 81 ---
--- [Tue Apr 19 19:38:06 2022] Assembling contigs from SdBG for k = 81 ---
--- [Tue Apr 19 19:38:09 2022] Local assembling k = 81 ---
--- [Tue Apr 19 19:40:01 2022] Extracting iterative edges from k = 81 to 85 ---
--- [Tue Apr 19 19:43:38 2022] Building graph for k = 85 ---
--- [Tue Apr 19 19:43:40 2022] Assembling contigs from SdBG for k = 85 ---
--- [Tue Apr 19 19:43:42 2022] Local assembling k = 85 ---
--- [Tue Apr 19 19:45:34 2022] Extracting iterative edges from k = 85 to 89 ---
--- [Tue Apr 19 19:48:30 2022] Building graph for k = 89 ---
--- [Tue Apr 19 19:48:31 2022] Assembling contigs from SdBG for k = 89 ---
--- [Tue Apr 19 19:48:33 2022] Local assembling k = 89 ---
--- [Tue Apr 19 19:50:22 2022] Extracting iterative edges from k = 89 to 93 ---
--- [Tue Apr 19 19:52:38 2022] Building graph for k = 93 ---
--- [Tue Apr 19 19:52:39 2022] Assembling contigs from SdBG for k = 93 ---
--- [Tue Apr 19 19:52:40 2022] Local assembling k = 93 ---
--- [Tue Apr 19 19:54:39 2022] Extracting iterative edges from k = 93 to 97 ---
--- [Tue Apr 19 19:56:32 2022] Building graph for k = 97 ---
--- [Tue Apr 19 19:56:33 2022] Assembling contigs from SdBG for k = 97 ---
--- [Tue Apr 19 19:56:34 2022] Local assembling k = 97 ---
--- [Tue Apr 19 19:58:26 2022] Extracting iterative edges from k = 97 to 99 ---
--- [Tue Apr 19 20:00:00 2022] Building graph for k = 99 ---
--- [Tue Apr 19 20:00:00 2022] Assembling contigs from SdBG for k = 99 ---
--- [Tue Apr 19 20:00:01 2022] Merging to output final contigs ---
--- [STAT] 66 contigs, total 24000 bp, min 203 bp, max 599 bp, avg 364 bp, N50 358 bp
--- [Tue Apr 19 20:00:02 2022] ALL DONE. Time elapsed: 12739.113557 seconds ---
5 of 15 steps (33%) done
rule idba_hybrid_assembly_1:
input: Preprocessing/mg.r1.preprocessed.fq, Preprocessing/mg.r2.preprocessed.fq, Preprocessing/mg.se.preprocessed.fq, Preprocessing/mt.r1.preprocessed.fq, Preprocessing/mt.r2.preprocessed.fq, Preprocessing/mt.se.preprocessed.fq, Assembly/mt.megahit_preprocessed.1/final.contigs.fa, Assembly/mt.megahit_unmapped.2/final.contigs.fa
output: Assembly/mgmt.idba_hybrid.1.fa
[x] Performing first hyrbid assembly step using IDBA
[x] Interleave MG and MT fastq files
[x] Join MG and MT interleaved fasta files
[x] Concatenate MT contigs, MT and MG single end files
number of threads 4
bash: line 14: 12672 Killed idba_ud -r $TMPD/merged.fa -l $TMPD/MT_contigs-MG_MT.SE.fa -o $TMPD --mink 25 --maxk 99 --step 4 --num_threads 4 --similar 0.98 --pre_correction
Error in job idba_hybrid_assembly_1 while creating output file Assembly/mgmt.idba_hybrid.1.fa.
RuleException:
CalledProcessError in line 16 of /home/imp/code/rules/Assembly/hybrid/idba.hybrid.rules:
Command '
echo "[x] Performing first hyrbid assembly step using IDBA"
echo "[x] Interleave MG and MT fastq files"
TMPD=$(mktemp -d -t --tmpdir=/home/imp/output/tmp "XXXXXX")
fq2fa --merge Preprocessing/mg.r1.preprocessed.fq Preprocessing/mg.r2.preprocessed.fq $TMPD/merged_MG.fa
fq2fa --merge Preprocessing/mt.r1.preprocessed.fq Preprocessing/mt.r2.preprocessed.fq $TMPD/merged_MT.fa
echo "[x] Join MG and MT interleaved fasta files"
cat $TMPD/merged_MG.fa $TMPD/merged_MT.fa > $TMPD/merged.fa
echo "[x] Concatenate MT contigs, MT and MG single end files"
cat <(cat Assembly/mt.megahit_preprocessed.1/final.contigs.fa Assembly/mt.megahit_unmapped.2/final.contigs.fa | awk '/^>/{print ">contig_MT_" ++i; next}{print}') <(cat Preprocessing/mg.se.preprocessed.fq | sed -n '1~4s/^@/>/p;2~4p') <(cat Preprocessing/mt.se.preprocessed.fq | sed -n '1~4s/^@/>/p;2~4p') > $TMPD/MT_contigs-MG_MT.SE.fa
idba_ud -r $TMPD/merged.fa -l $TMPD/MT_contigs-MG_MT.SE.fa -o $TMPD --mink 25 --maxk 99 --step 4 --num_threads 4 --similar 0.98 --pre_correction
mv $TMPD/contig.fa Assembly/mgmt.idba_hybrid.1.fa
rm -rf $TMPD
' returned non-zero exit status 137
File "/home/imp/code/rules/Assembly/hybrid/idba.hybrid.rules", line 16, in __rule_idba_hybrid_assembly_1
File "/usr/lib/python3.4/concurrent/futures/thread.py", line 54, in run
Will exit after finishing currently running jobs.
/home/imp/data/1018_metagenome_1.fastq => Preprocessing/mg.r1.fq
/home/imp/data/1018_metagenome_2.fastq => Preprocessing/mg.r2.fq
/home/imp/data/1018_metagenome_single.fastq => Preprocessing/mg.se.fq
/home/imp/data/1018_transcriptome_1.fastq => Preprocessing/mt.r1.fq
/home/imp/data/1018_transcriptome_2.fastq => Preprocessing/mt.r2.fq
/home/imp/data/1018_transcripptome_single.fastq => Preprocessing/mt.se.fq
symlink mg.r1.fq => Preprocessing/mg.r1.preprocessed.fq
symlink mg.r2.fq => Preprocessing/mg.r2.preprocessed.fq
symlink mg.se.fq => Preprocessing/mg.se.preprocessed.fq
symlink mt.r1.fq => Preprocessing/mt.r1.preprocessed.fq
symlink mt.r2.fq => Preprocessing/mt.r2.preprocessed.fq
symlink mt.se.fq => Preprocessing/mt.se.preprocessed.fq
Exiting because a job execution failed. Look above for error message
```https://git-r3lab.uni.lu/IMP/IMP/-/issues/165input data error2022-04-25T17:51:44+02:00EEinput data errorhello
i use docker for running IMP as non-root, after following instruction of installation i tried to run IMP using this code: impy --threads 8 --memtotal 32 --memcore 4 run -m /root/esraa/dna_data/dna_sample1_R1.fq -m /root/esraa/dna_...hello
i use docker for running IMP as non-root, after following instruction of installation i tried to run IMP using this code: impy --threads 8 --memtotal 32 --memcore 4 run -m /root/esraa/dna_data/dna_sample1_R1.fq -m /root/esraa/dna_data/dna_sample1_R2.fq -o output --single-omics
**i got this error:**
\*\*MissingInputException in line 90 of /home/imp/code/rules/data.input.rules: Missing input files for rule prepare_input_data: /home/imp/data/dna_sample1_R1.fq /home/imp/data/dna_sample1_R2.fq \*\*
in this error, it says my data is located in /home/imp/data/ directory but really i don't have this directory
Also running **impy --enter ...** showed that i don't have the data/ directory
**\~/code$ ls CHANGELOG MANIFEST.in README.rst VERSION docker impy.py outpu requirements.txt setup.py test LICENSE PyPI.md Snakefile conf imp-db lib output rules src workflows**
how can i fix this problem?
thank youhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/163Installation error with "impy install-imp-container"2022-04-22T15:00:37+02:00MimiInstallation error with "impy install-imp-container"
Hi, I tried to install using impy and got an error complaining of urlopen error. help? Thank you.2
```
$ impy install-imp-container
[x] Downloading IMP TARBALL at 'https://webdav-r3lab.uni.lu/public/R3lab/IMP/dist/imp-1.4.1.tar.gz'
Tr...
Hi, I tried to install using impy and got an error complaining of urlopen error. help? Thank you.2
```
$ impy install-imp-container
[x] Downloading IMP TARBALL at 'https://webdav-r3lab.uni.lu/public/R3lab/IMP/dist/imp-1.4.1.tar.gz'
Traceback (most recent call last):
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/http/client.py", line 1282, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/http/client.py", line 1328, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/http/client.py", line 1277, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/http/client.py", line 1037, in _send_output
self.send(msg)
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/http/client.py", line 975, in send
self.connect()
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/http/client.py", line 1447, in connect
super().connect()
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/http/client.py", line 941, in connect
self.sock = self._create_connection(
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/socket.py", line 845, in create_connection
raise err
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/socket.py", line 833, in create_connection
sock.connect(sa)
OSError: [Errno 113] No route to host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/bin/impy", line 8, in <module>
sys.exit(cli())
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/site-packages/impy.py", line 235, in install_imp_container
with urlopen(ctx.obj['image-repo']) as response, open(fname, 'wb') as out_file:
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/home/user/.pyenv/versions/anaconda3-2018.12/envs/IMP/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 113] No route to host>
```https://git-r3lab.uni.lu/IMP/imp3/-/issues/57Binning: Rework binning module2023-11-08T12:09:26+01:00Valentina Galatavalentina.galata@uni.luBinning: Rework binning module- Overhaul binning module
- Replace current `binny` implementation with its new version: https://github.com/a-h-b/binny- Overhaul binning module
- Replace current `binny` implementation with its new version: https://github.com/a-h-b/binnyAnalysis & Binning updatesOskar HicklOskar Hicklhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/56Error in rule merge_assembly_cap3 - segmentation fault2021-12-07T07:59:46+01:00javiercnavError in rule merge_assembly_cap3 - segmentation fault## Bug report
IMP3 stopped at the step of merge_assembly_cap3 without generating any content in the corresponding log file, but the screen message shown below:
Error in rule merge_assembly_cap3:
jobid: 9
output: Assembly/mg.ass...## Bug report
IMP3 stopped at the step of merge_assembly_cap3 without generating any content in the corresponding log file, but the screen message shown below:
Error in rule merge_assembly_cap3:
jobid: 9
output: Assembly/mg.assembly.merged.fa
log: logs/assembly_merge_assembly_cap3.log (check log file(s) for error message)
conda-env: /users/jcnavarro/conda/e7a8faf3f71c32a40cabb702d36f2415
shell:
NAME_fin=Assembly/mg.assembly
NAME=Assembly/intermediary/mg.assembly
cat Assembly/intermediary/mg.megahit_preprocessed.1.fa Assembly/intermediary/mg.megahit_unmapped.2.fa > $NAME.cat.fa
# Run cap3
cap3 $NAME.cat.fa -p 98 -o 100 > logs/assembly_merge_assembly_cap3.log 2>&1
# Concatenate assembled contigs, singletons and rename the contigs
cat $NAME.cat.fa.cap.contigs $NAME.cat.fa.cap.singlets | awk '/^>/{print ">T111_contig_" ++i; next}{print}' > $NAME_fin.merged.fa
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /mnt/san.ese/nfs/users/jcnavarro/.snakemake/log/2021-11-23T110427.419676.snakemake.log
### Log files and screenshots
see attachments
(List of logs/screenshots)
## Steps to reproduce
I run IMP3 with an smaller dataset and it run well.
**Code version**
```txt
commit hash
branch name
```
**Config files**
<!-- Attach used and created config files. -->
[terraces.yaml](/uploads/68a438d10a822355a43eee320154c814/terraces.yaml)
[2021-11-23T110427.419676.snakemake.log](/uploads/ac4bb93e8191ec4ab998e0065109e18e/2021-11-23T110427.419676.snakemake.log)
![Screen_shot_error](/uploads/91968b384185bdf63cce17228eebc1e6/Screen_shot_error.png)
![Screen_Shot_logs](/uploads/9054d4c57ab11de8f233f57b35bbbca6/Screen_Shot_logs.png)
[sample.config.yaml](/uploads/b82ef66a49d99e0208ecc24a0e0d8523/sample.config.yaml)
(List of config files)
**Command**
<!-- Attach used launcher script and/or provide the command below. -->
```bash
# command used to launch IMP3
snakemake -s /users/jcnavarro/IMP3/Snakefile --configfile /users/jcnavarro/playing.yaml --use-conda --conda-prefix /users/jcnavarro/conda/ --cores 20
```
**Input**
<!-- Provide relevant information about your input files. -->
R1/R2 fastq.gz for the metagenome of a soil sample. Each has a size of 20 Gb when compressed.
**System**
<!-- Short description of the system setup where you are running `IMP3`. -->
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
MemTotal: 528365004 kB
Threads/core: 64
## Possible fixes
<!-- If you can, link to the file or line of code that might be responsible for the problem - please make sure that the linked file corresponds to the indicated version of the code. -->
(Optional: How to fix)
--------------------------------------------------https://git-r3lab.uni.lu/IMP/imp3/-/issues/55Job fails due to conda environment installation2021-12-08T10:36:25+01:00michougJob fails due to conda environment installation## Bug report
<!-- Describe the bug/error you have encountered. -->
Hi,
When launching IMP3 jobs using the iris cluster and the attached files, in some cases, this error happens :
[GL100_DN.launchIMP.sh](/uploads/cb56a27e3e15b0958c4d9...## Bug report
<!-- Describe the bug/error you have encountered. -->
Hi,
When launching IMP3 jobs using the iris cluster and the attached files, in some cases, this error happens :
[GL100_DN.launchIMP.sh](/uploads/cb56a27e3e15b0958c4d9f2f46800a53/GL100_DN.launchIMP.sh)
[GL100_DN.runIMP.sh](/uploads/978beb0ffa116de85ab40228362ec25d/GL100_DN.runIMP.sh)
[GL100_DN_config.yaml](/uploads/eff8270c0d05fa76b89dbf6245123e74/GL100_DN_config.yaml)
```
/home/users/gmichoud/IMP3/.
Building DAG of jobs...
Creating conda environment /home/users/gmichoud/IMP3/workflow/rules/ini/../../envs/IMP_binning.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/users/gmichoud/IMP3/workflow/rules/ini/../../envs/IMP_binning.yaml:
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working...
####################################################################################
das_tool version 1.1.1-2 has been successfully installed!
This software by default runs with USEARCH. You can install it from the following links or use DIAMOND '--search_engine diamond'
> Download: http://www.drive5.com/usearch/download.html
> Installation instruction: http://www.drive5.com/usearch/manual/install.html
done
ERROR conda.core.link:_execute(699): An error occurred while installing package 'conda-forge::fribidi-1.0.5-h516909a_1002'.
Rolling back transaction: ...working... done
CondaError: Cannot link a source that does not exist. /home/users/gmichoud/IMP3/conda/4788f34499eae03fa997bfd1ba6c307c/.condatmp/e7595ab0-dd9c-485f-beb1-48de0bac8174
Running `conda clean --packages` may resolve your problem.
()
```
the python version I use is Python 3.9.1, installed via Miniconda.
The git branch is the latest updated (as of yesterday)
Any ideas?
Best
Greghttps://git-r3lab.uni.lu/IMP/imp3/-/issues/53Refactoring2021-11-15T08:33:46+01:00pedro queirospedro.queiros@uni.luRefactoringOverhaulpedro queirospedro.queiros@uni.lupedro queirospedro.queiros@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/52Assembly/Analysis: replace perl scripts for contig stats2021-10-05T11:08:30+02:00Valentina Galatavalentina.galata@uni.luAssembly/Analysis: replace perl scripts for contig stats## Feature request
To simplify the code (and avoid hard-coded paths in the CMD), replace the Perl scripts used in [contig-length.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Assembly/common/contig-length.smk) to c...## Feature request
To simplify the code (and avoid hard-coded paths in the CMD), replace the Perl scripts used in [contig-length.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Assembly/common/contig-length.smk) to compute contig length and GC content by already existing utils.
For example, `seqkit` provides a utility to do that (see sub-command `fx2tab` [here](https://bioinf.shenwei.me/seqkit/usage/)) and this tool is already included in multiple `conda` environments.https://git-r3lab.uni.lu/IMP/imp3/-/issues/51Preprocessing: kneaddata for reads filtering2021-09-30T13:10:26+02:00Valentina Galatavalentina.galata@uni.luPreprocessing: kneaddata for reads filtering## Feature request
I would propose to consider to use `kneaddata` for reads filtering.
> This tool aims to perform principled in silico separation of bacterial reads from these "contaminant" reads, be they from the host, from bacterial...## Feature request
I would propose to consider to use `kneaddata` for reads filtering.
> This tool aims to perform principled in silico separation of bacterial reads from these "contaminant" reads, be they from the host, from bacterial 16S sequences, or other user-defined sources.
- can be installed via `conda`
- can use multiple references for filtering
- outputs reads mapped to each given reference in separate FASTQ files
- (runs `fastqc` for the input/output FASTQ files)
The rRNA filtering step could be included there as well or it could still be a separate rule.
With or without the rRNA filtering, this would reduce the code complexity considerably: there would be no need for those "chained" FASTQ files with multiple filtering-suffixes in their names.
The trimming step included in `kneaddata` can and has to be skipped because of the optional poly-G trimming which has to be done prior to filtering.
`kneaddata`:
- [web site](https://huttenhower.sph.harvard.edu/kneaddata/)
- [repo](https://github.com/biobakery/kneaddata)
- [tutorials](https://github.com/biobakery/biobakery/wiki/kneaddata#paired-end-reads)
- [forum](https://forum.biobakery.org/c/Infrastructure-and-utilities/KneadData/8)https://git-r3lab.uni.lu/IMP/imp3/-/issues/50Assembly stats: metaQUAST2021-09-27T08:04:14+02:00Valentina Galatavalentina.galata@uni.luAssembly stats: metaQUAST## Feature request
Should we include `metaQUAST` to get the basic statistics for the assembly/assemblies?
- [paper](https://academic.oup.com/bioinformatics/article/32/7/1088/1743987)
- [repo](https://github.com/ablab/quast)## Feature request
Should we include `metaQUAST` to get the basic statistics for the assembly/assemblies?
- [paper](https://academic.oup.com/bioinformatics/article/32/7/1088/1743987)
- [repo](https://github.com/ablab/quast)https://git-r3lab.uni.lu/IMP/imp3/-/issues/48Unexpected behavior: `ancient` keyword may trigger Snakemake scheduler issues2021-09-16T00:33:24+02:00Susheel BusiUnexpected behavior: `ancient` keyword may trigger Snakemake scheduler issues@valentina.galata and @anna.buschart
- When running metaG samples through `imp3` [commit ecead78], I noticed the following message/warning in the SLURM log file:
```
Failed to solve scheduling problem with ILP solver in time (10s). Fal...@valentina.galata and @anna.buschart
- When running metaG samples through `imp3` [commit ecead78], I noticed the following message/warning in the SLURM log file:
```
Failed to solve scheduling problem with ILP solver in time (10s). Falling back to greedy solver.
```
- While it doesn't cause any issue per se, I noticed that the jobs are taking longer to complete. I can't necessarily confirm this yet, and the sample is still running, but I've attached the [SLURM](/uploads/216b227eb3b2a5a7363fc1934cdeee29/slurm-2484978.out) log.
- A little digging revealed an issue with `snakemake` when the keyword `ancient` is used. See here: https://github.com/snakemake/snakemake/issues/946
- I noticed that `imp3` uses this keyword in the [function.definitions rule](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/function.definitions.smk)
Something to keep an eye on for the future ;)https://git-r3lab.uni.lu/IMP/imp3/-/issues/47Conda envs: smaller YAML files2022-02-04T14:55:27+01:00Valentina Galatavalentina.galata@uni.luConda envs: smaller YAML filesI would like to propose to have smaller `conda` env. YAML files instead of the big monolithic per-step files being currently used.
According to `snakemake`'s [documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/deploymen...I would like to propose to have smaller `conda` env. YAML files instead of the big monolithic per-step files being currently used.
According to `snakemake`'s [documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#distribution-and-reproducibility) the `conda` environments should be
> as finegrained as possible to improve transparency and maintainability.
In many cases, it makes sense to have per-rule or per-tool YAML files, e.g. for `MEGAHIT` or `metaSPAdes`.
This could be discussed and decided for each step and rule to avoid installation of the same tools in different environments.OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/46Transcriptome assembly with rnaSPAdes2021-09-13T09:31:47+02:00Valentina Galatavalentina.galata@uni.luTranscriptome assembly with rnaSPAdesCurrent rule for `rnaSPAdes` includes multiple k-mers and uses `--meta`: https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Assembly/single-omic/mt/metaspades.smk.
From [rnaSPAdes manual v3.21.0](https://cab.spbu.ru/files/re...Current rule for `rnaSPAdes` includes multiple k-mers and uses `--meta`: https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Assembly/single-omic/mt/metaspades.smk.
From [rnaSPAdes manual v3.21.0](https://cab.spbu.ru/files/release3.12.0/rnaspades_manual.html)
> rnaSPAdes works using only a single k-mer size (automatically detected using read length by the default).
> We strongly recommend not to change this parameter.
>
> rnaSPAdes is not compatible with other pipeline options such as --meta, --sc and --plasmid. If you wish to assemble metatranscriptomic data just run rnaSPAdes as it is.https://git-r3lab.uni.lu/IMP/imp3/-/issues/45Unit testing, CI and merging2021-10-26T10:11:40+02:00Valentina Galatavalentina.galata@uni.luUnit testing, CI and mergingConsider to include
- [Snakemake unit tests](https://snakemake.readthedocs.io/en/stable/snakefiles/testing.html#snakefiles-testing) (at least for some steps/rules)
- Continuous Integration (which is currently not working)
- [Fast Forward...Consider to include
- [Snakemake unit tests](https://snakemake.readthedocs.io/en/stable/snakefiles/testing.html#snakefiles-testing) (at least for some steps/rules)
- Continuous Integration (which is currently not working)
- [Fast Forward Merge](https://docs.gitlab.com/ee/user/project/merge_requests/fast_forward_merge.html)
The current test data set might be too big for this purpose.
We might need another small and simple dummy test dataset.
*TODOs: TBD*OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/43Formatting convention2021-10-08T13:45:35+02:00Valentina Galatavalentina.galata@uni.luFormatting convention### Formatting convention
Define a formatting convention and clean up the code according to the new guidelines.
The formatting convention is defined in the [Wiki](https://git-r3lab.uni.lu/IMP/imp3/-/wikis/Code-formatting-convention).
...### Formatting convention
Define a formatting convention and clean up the code according to the new guidelines.
The formatting convention is defined in the [Wiki](https://git-r3lab.uni.lu/IMP/imp3/-/wikis/Code-formatting-convention).
To define:
- rule structure template/guidelines
- Python:
- [PEP8](https://www.python.org/dev/peps/pep-0008/)
- 4 spaces instead of tabs
- double or single quotes?
- comments: reStructered Text or NumPy/SciPy?
- settings access
- either only global variables or `config`
- if global variables, then these should be defined in one place
To try:
- code quality checker (linter): `snakemake --lint`
- highlights issues to be solved to follow best practices
- highly recommended before publishing workflows
- automatic formatter: [Snakefmt](https://github.com/snakemake/snakefmt), based on [Black](https://black.readthedocs.io/en/stable/)
References:
- [Python docstring formats](https://realpython.com/documenting-python-code/#docstring-formats)
### Code "cleaning"
- changes w.r.t. formatting conventions mentioned above
- Python code
- `snakemake` rules
- (other scripts)
- add comments where possible
- one place for all helper functions
- no variables/functions inside rule files
- use `os.path` utils when working with paths
- replace code duplication with custom functions
- consistent indentation
- Q: can we simplify the structure in `workflow/rules/` ???OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/41Configuration: configs, profiles and launchers2022-04-19T13:41:44+02:00Valentina Galatavalentina.galata@uni.luConfiguration: configs, profiles and launchersFollow the recommendations from `snakemake` and the [best practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) to update and restructure the current configuration setup:
- `snakemake` config
- move run...Follow the recommendations from `snakemake` and the [best practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) to update and restructure the current configuration setup:
- `snakemake` config
- move runtime specific configuration to `snakemake` profile
- `snakemake` profile
- to set default values for command line options
- `slurm` config
- (cluster configuration has been officially deprecated)
- incl. in `snakemake` profile
- ~~shell-variables config~~
- will be part of `snakemake` profile
- ~~launcher scripts~~
- all parameters should be given via `snakemake` profiles
### References
- [Snakemake: profiles](https://snakemake.readthedocs.io/en/stable/executing/cli.html?highlight=profile#profiles)
- [Template Snakemake profile for SLURM](https://github.com/Snakemake-Profiles/slurm)OverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/38nonpareil: simplify the rule2022-02-10T07:35:22+01:00Valentina Galatavalentina.galata@uni.lunonpareil: simplify the ruleSimplify the rule `nonpareil` in [nonpareil.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Preprocessing/nonpareil.smk) and improve the formatting if possible.
For future reference:
- `nonpareil` does not take gzipp...Simplify the rule `nonpareil` in [nonpareil.smk](https://git-r3lab.uni.lu/IMP/imp3/-/blob/master/workflow/rules/Preprocessing/nonpareil.smk) and improve the formatting if possible.
For future reference:
- `nonpareil` does not take gzipped FASTQ files: see [issue 35](https://github.com/lmrodriguezr/nonpareil/issues/35) --> might get fixed in new version
- `nonpareil` runs are not completely reproducible: see [issue 48](https://github.com/lmrodriguezr/nonpareil/issues/48) --> might get fixed in new versionOverhaulValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/IMP/imp3/-/issues/36Assembly: MG/MT assembly with MetaSPAdes2021-08-23T11:02:03+02:00Valentina Galatavalentina.galata@uni.luAssembly: MG/MT assembly with MetaSPAdesImplement MG/MT assembly using `metaspades`.
Currently, only `megahit` can be used for the MG/MT ("hybrid") assembly.Implement MG/MT assembly using `metaspades`.
Currently, only `megahit` can be used for the MG/MT ("hybrid") assembly.