IMP issueshttps://git-r3lab.uni.lu/IMP/IMP/-/issues2018-10-04T14:26:35+02:00https://git-r3lab.uni.lu/IMP/IMP/-/issues/141IMP-specific prokka conda recipe needed2018-10-04T14:26:35+02:00Cedric LacznyIMP-specific prokka conda recipe needed`prokka` comes by default with `tbl2asn`.
While this makes sense when using prokka for isolate genomes, the use of this step for metagenomes is less apparent.
In particular, this step is very time-consuming, i.e., can take 10 hours or mo...`prokka` comes by default with `tbl2asn`.
While this makes sense when using prokka for isolate genomes, the use of this step for metagenomes is less apparent.
In particular, this step is very time-consuming, i.e., can take 10 hours or more.
Moreover, the output is currently not used at all by the IMP pipeline.
Hence, creating a dedicated recipe for prokka is suggested.
A dedicated recipe would allow us to apply a patch to prokka's perl script to disable the tbl2asn step.
While it could also be described in the installation instructions of IMP to comment-out the respective line when using the prokka bioconda recipe, having a dedicated recipe seems to be the cleaner solution.https://git-r3lab.uni.lu/IMP/IMP/-/issues/140Why are Fasta index and BAM index not created by distinct rules, but in the v...2018-10-01T15:52:58+02:00Cedric LacznyWhy are Fasta index and BAM index not created by distinct rules, but in the variant_calling rule?The following seems to be suboptimal to me:
https://git-r3lab.uni.lu/IMP/IMP/blob/master/rules/Analysis/variant.rule#L11
https://git-r3lab.uni.lu/IMP/IMP/blob/master/rules/Analysis/variant.rule#L18
Shouldn't the respective files be part...The following seems to be suboptimal to me:
https://git-r3lab.uni.lu/IMP/IMP/blob/master/rules/Analysis/variant.rule#L11
https://git-r3lab.uni.lu/IMP/IMP/blob/master/rules/Analysis/variant.rule#L18
Shouldn't the respective files be part of the `input:` **and** have a respective rule, rather than being created within the `variant_calling` rule, if they do not originally exist?
`{input[1]}.bai` is probably less of an issue as it is *independent* for metaG and metaT, but since `{input[0]}.fai` is the merged assembly of both types (`mg` and `mt`), it needs to be created only *once*.
Could it be that executing this rule for `mg` and `mt` *in parallel* could create issues/overwritings/collisions here?
Best,
Cedrichttps://git-r3lab.uni.lu/IMP/IMP/-/issues/137samtools-1.9 might break host filtering step2018-10-08T16:28:24+02:00Cedric Lacznysamtools-1.9 might break host filtering stepAs part of the effort to enable conda-based installation, it occured that `samtools-1.9` was installed, despite that `0.1.19` **should** be installed (iris had newer version for an as-of-yet unknown reason, i.e., `all.yaml` specified `sa...As part of the effort to enable conda-based installation, it occured that `samtools-1.9` was installed, despite that `0.1.19` **should** be installed (iris had newer version for an as-of-yet unknown reason, i.e., `all.yaml` specified `samtools==0.1.19` and was **identical** between iris and litcrit).
It is speculated that this lead to the host filtering step failing on iris:
```
rule mt_filtering:
input: Preprocessing/mt.r1.trimmed.rna_filtered.fq, Preprocessing/mt.r2.trimmed.rna_filtered.fq, Preprocessing/mt.se.trimmed.rna_filtered.fq,
/home/users/claczny/projects/imp-devel/dbs/hg38-db/filtering/hg38.fa, /home/users/claczny/projects/imp-devel/dbs/hg38-db/filtering/hg38.fa, /home/
users/claczny/projects/imp-devel/dbs/hg38-db/filtering/hg38.fa.amb, /home/users/claczny/projects/imp-devel/dbs/hg38-db/filtering/hg38.fa.ann, /hom
e/users/claczny/projects/imp-devel/dbs/hg38-db/filtering/hg38.fa.bwt, /home/users/claczny/projects/imp-devel/dbs/hg38-db/filtering/hg38.fa.pac, /h
ome/users/claczny/projects/imp-devel/dbs/hg38-db/filtering/hg38.fa.sa
output: Preprocessing/mt.r1.trimmed.rna_filtered.hg38_filtered.fq, Preprocessing/mt.r2.trimmed.rna_filtered.hg38_filtered.fq, Preprocessing/mt
.se.trimmed.rna_filtered.hg38_filtered.fq
jobid: 13
TMP_FILE=$(mktemp --tmpdir=/tmp -t "alignment_XXXXXX.bam")
BUFFER=$(mktemp --tmpdir=/tmp -t "alignment_XXXXXX.bam")
bwa mem -v 1 -t 26 /home/users/claczny/projects/imp-devel/dbs/hg38-db/filtering/hg38.fa Preprocessing/mt.r1.trimmed.rna_filtered.fq Prepro
cessing/mt.r2.trimmed.rna_filtered.fq | samtools view -@ 26 -bS - > $TMP_FILE
samtools merge -@ 26 -u - <(samtools view -@ 26 -u -f 4 -F 264 $TMP_FILE) <(samtools view -@ 26 -u -f 8 -F 260 $TMP_FILE)
<(samtools view -@ 26 -u -f 12 -F 256 $TMP_FILE) | samtools view -@ 26 -bF 0x800 - | samtools sort -o -@ 26 -m 2G -n - $BUFFER -
| bamToFastq -i stdin -fq Preprocessing/mt.r1.trimmed.rna_filtered.hg38_filtered.fq -fq2 Preprocessing/mt.r2.trimmed.rna_filtered.hg38_fi
ltered.fq
if [[ -s Preprocessing/mt.se.trimmed.rna_filtered.fq ]]
then
bwa mem -v 1 -t 26 /home/users/claczny/projects/imp-devel/dbs/hg38-db/filtering/hg38.fa Preprocessing/mt.se.trimmed.rna_filtered.fq | samt
ools view -@ 26 -bS - | samtools view -@ 26 -uf 4 - | bamToFastq -i stdin -fq Preprocessing/mt.se.trimmed.rna_filtered.hg38_filtered.fq
else
echo "Preprocessing/mt.se.trimmed.rna_filtered.fq is empty, skipping single end human sequence filtering, but creating it anyway..."
touch Preprocessing/mt.se.trimmed.rna_filtered.hg38_filtered.fq
fi
rm -rf $BUFFER* $TMP_FILE
[...]
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[main] Version: 0.7.9a-r786
[main] CMD: bwa mem -v 1 -t 26 /home/users/claczny/projects/imp-devel/dbs/hg38-db/filtering/hg38.fa Preprocessing/mt.r1.trimmed.rna_filtered.fq Preprocessing/mt.r2.trimmed.rna_filtered.fq
[main] Real time: 255.606 sec; CPU: 4680.023 sec
Usage: samtools sort [options...] [in.bam]
Options:
-l INT Set compression level, from 0 (uncompressed) to 9 (best)
-m INT Set maximum memory per thread; suffix K/M/G recognized [768M]
-n Sort by read name
-t TAG Sort by value of TAG. Uses position as secondary index (or read name if -n is set)
-o FILE Write final output to FILE rather than standard output
-T PREFIX Write temporary files to PREFIX.nnnn.bam
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
-@, --threads INT
Number of additional threads to use [0]
[W::bam_merge_core2] No @HD tag found.
Error in rule mt_filtering:
jobid: 13
output: Preprocessing/mt.r1.trimmed.rna_filtered.hg38_filtered.fq, Preprocessing/mt.r2.trimmed.rna_filtered.hg38_filtered.fq, Preprocessing/mt.se.trimmed.rna_filtered.hg38_filtered.fq
```
This needs further inspection.https://git-r3lab.uni.lu/IMP/IMP/-/issues/105megahit iterations: 99-25=74 modulo 4 is 2 and not 02017-10-04T12:29:15+02:00Patrick Maymegahit iterations: 99-25=74 modulo 4 is 2 and not 0you are producing with megahit kmer assemblies from 25 to 97 with increment 4 and additional the 99 kmer assembly, which is an artefact by the megahit tool.
At some point you should change this, also in the documentation, it is misleadin...you are producing with megahit kmer assemblies from 25 to 97 with increment 4 and additional the 99 kmer assembly, which is an artefact by the megahit tool.
At some point you should change this, also in the documentation, it is misleading because 99-25=74 modulo 4 is 2 and not 0https://git-r3lab.uni.lu/IMP/IMP/-/issues/91Custom screening does not work well for launching in batches2019-08-27T08:50:22+02:00Shaman NarayanasamyCustom screening does not work well for launching in batchesWhen launching several instances of IMP, in parallel, the screen parameter seems to be indexing the fasta file each time IMP is launched. This is not an issue in a normal scenario, but in parallel, all the IMP instances are clashing and ...When launching several instances of IMP, in parallel, the screen parameter seems to be indexing the fasta file each time IMP is launched. This is not an issue in a normal scenario, but in parallel, all the IMP instances are clashing and trying to index the same file, which makes corrupts the process, which is then terminated. Not sure why Snakemake doesn't recognize the existence of the bwa index files that were indexed previously. In addition, if a given screen file is very big, we would also ideally want to index it only once. For example, the human genome takes two hours to index...
-Shaman-
Edit:
So, looks like we figured out why this is happening. The `--screen` parameter first copies the relevant fasta file into the `~/database/filtering` (or `db/filtering`) folder. This causes the indexed files (from the previous run) to have a later time stamp compared to the copied fasta file. Therefore `Snakemake` invokes the indexing parameter again. @yjarosz, any idea what is the best way to solve this. I was thinking that we could give some conditions (`bash`) within the rules to deal with it. Let me know what you think.Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/79Metaquast not working2017-11-10T03:44:08+01:00Shaman NarayanasamyMetaquast not workingMetaQUAST report does not appear in the IMP main reportMetaQUAST report does not appear in the IMP main reportShaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/64Tracking outreach2018-02-09T09:10:34+01:00Cedric LacznyTracking outreachHi,
how about using http://www.altmetric.com/products/altmetric-badges/ in order to create a simple outreach-tracker and put it on the IMP website, e.g.:
```
<div data-badge-popover="right" data-badge-type="medium-donut" data-doi="1...Hi,
how about using http://www.altmetric.com/products/altmetric-badges/ in order to create a simple outreach-tracker and put it on the IMP website, e.g.:
```
<div data-badge-popover="right" data-badge-type="medium-donut" data-doi="10.1101/039263 " data-hide-no-mentions="true" class="altmetric-embed"></div>
```
That's how the Altmetrics of the IMP preprint currently look like:
http://www.altmetric.com/details/5304482
Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/63R script reading of GFF files2018-02-09T09:10:34+01:00Shaman NarayanasamyR script reading of GFF filesIn certain data sets, the error below occurs.
```
[1] "Read in gff3 annotation file"
Read 254171 records
Error in validObject(.Object) :
invalid class "Genome_intervals_stranded" object: The 'annotation' slot should have a colu...In certain data sets, the error below occurs.
```
[1] "Read in gff3 annotation file"
Read 254171 records
Error in validObject(.Object) :
invalid class "Genome_intervals_stranded" object: The 'annotation' slot should have a column named 'inter_base' that is logical and does not contain missing values.
Calls: readGff3 ... eval -> eval -> .nextMethod -> initMatrix -> validObject
In addition: Warning message:
In readGff3(annot_file, isRightOpen = TRUE) :
'readGff3' has changed to closed interval conventions!
Use 'isRightOpen=TRUE' to restore the previous behavior
that allowed for zero-length features. Alternatively, use
the readZeroLengthFeaturesGff3 function instead.
You can turn off this warning by setting 'quiet=TRUE'
Execution halted
```
Need to handle these cases appropriately so that the pipeline doesn't break for this error.Shaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/60IMP execution outside of code directory2018-02-09T09:10:34+01:00Shaman NarayanasamyIMP execution outside of code directoryIMP is currently not executable outside of the directory containing the code. Or I am doing it the wrong way. This is an issue because it is not efficient. At present, everyone running IMP has their own copy or the repository, the databa...IMP is currently not executable outside of the directory containing the code. Or I am doing it the wrong way. This is an issue because it is not efficient. At present, everyone running IMP has their own copy or the repository, the databases etc. Is it possible to have a centralized repository which everyone can run from?
Below is the error I get when trying to execute outide the IMP directory.
```
fatal: Not a git repository (or any parent up to mount point /mnt/md1200)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Traceback (most recent call last):
File "/home/snarayanasamy/Work/tools/IMP/IMP", line 252, in <module>
args = docopt(__doc__, version=get_git_version(), options_first=True)
File "/home/snarayanasamy/Work/tools/IMP/IMP", line 122, in get_git_version
['git', '--no-pager', 'log', '-n', '1', '--pretty=format:%H']
File "/usr/local/lib/python3.4/subprocess.py", line 620, in check_output
raise CalledProcessError(retcode, process.args, output=output)
subprocess.CalledProcessError: Command '['git', '--no-pager', 'log', '-n', '1', '--pretty=format:%H']' returned non-zero exit status 128
```
Is there any way we can fix this? My IMP code directory is getting so messy with all the log files etc :six:
Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/58Work on snakemake step names2018-02-09T09:10:34+01:00Shaman NarayanasamyWork on snakemake step namesChange names to be more meaningful and understandable.
NOTE: Remember to also change log file names
Change names to be more meaningful and understandable.
NOTE: Remember to also change log file names
Shaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/56Update MEGAHIT2018-02-09T09:10:34+01:00Shaman NarayanasamyUpdate MEGAHITUpdate to latest version that accommodates paired end information.
Update corresponding command for MEGAHIT within snakemake rules.Update to latest version that accommodates paired end information.
Update corresponding command for MEGAHIT within snakemake rules.Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/49R migrate functions2018-02-09T09:10:34+01:00Shaman NarayanasamyR migrate functionsClean up R functions from the plot script by moving to a function script.
Make sure handling of NA's are optimal!!Clean up R functions from the plot script by moving to a function script.
Make sure handling of NA's are optimal!!Shaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/48MG detuplication step not appearing in snakemake stdout.2018-02-09T09:10:34+01:00Shaman NarayanasamyMG detuplication step not appearing in snakemake stdout.Files generated, but there seems to be not stdout for that step...
Looks like it does appear in the log, but it appears after the annotation rule... I wonder if the analysis went fine, if this was the case...
This behaviour seems c...Files generated, but there seems to be not stdout for that step...
Looks like it does appear in the log, but it appears after the annotation rule... I wonder if the analysis went fine, if this was the case...
This behaviour seems consistent across all analyses samples. Wonder if the analysis itself is fine...
```
rule ANALYSIS_ANNOTATE:
input: /output/Assembly/MGMT.assembly.merged.fa, /databases/cm/Bacteria.i1i, /databases/genus/Staphylococcus.phr, /databases/hmm/CLUSTERS.hmm.h3f, /databases/kingdom/Archaea/sprot.phr
output: /output/Analysis/annotation/annotation.filt.gff
log: /output/Analysis/Analysis.log
benchmark: /output/Analysis/benchmarks/ANALYSIS_ANNOTATE.json
Softlinking /usr/bin/../db to /databases
18 of 40 steps (45%) done
rule PREPROCESSING_MG_DEDUPLICATE:
input: /output/Preprocessing/MG.R1.fq, /output/Preprocessing/MG.R2.fq
output: /output/Preprocessing/MG.R1.uniq.fq, /output/Preprocessing/MG.R2.uniq.fq
log: /output/Preprocessing/Preprocessing.log
benchmark: /output/Preprocessing/benchmarks/PREPROCESSING_MG_DEDUPLICATE.json
19 of 40 steps (48%) done
rule ANALYSIS_MG_READ_COUNT:
input: /output/Preprocessing/MG.R1.fq, /output/Preprocessing/MG.R2.fq, /output/Preprocessing/MG.R1.uniq.fq, /output/Preprocessing/MG.R2.uniq.fq, /output/Preprocessing/MG.R1.uniq.trimmed.fq, /output/Preprocessing/MG.R2.uniq.trimmed.fq, /output/Preprocessing/MG.SE.uniq.trimmed.fq, /output/Preprocessing/MG.R1.uniq.trimmed.hg38.fq, /output/Preprocessing/MG.R2.uniq.trimmed.hg38.fq, /output/Preprocessing/MG.SE.uniq.trimmed.hg38.fq
output: /output/Analysis/MG.read_counts.txt
log: /output/Analysis/Analysis.log
benchmark: /output/Analysis/benchmarks/ANALYSIS_MG_READ_COUNT.json
```Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/45Changing ownership of files within docker2018-02-09T09:10:34+01:00Shaman NarayanasamyChanging ownership of files within dockerIs it possible to change ownership of files within docker? For instance, create a rule that converts the directory and file ownership to the user that launched the job.
I realized that we are having issues moving the files because we ...Is it possible to change ownership of files within docker? For instance, create a rule that converts the directory and file ownership to the user that launched the job.
I realized that we are having issues moving the files because we do not own them. This is rather inconvenient as the user needs to be root in order to change the file. It is not a problem at present for bigbug users as there are only two people actively running IMP, but will be in the future...
Thanks @anne.kaysen for bringing this to light.
Update: Some googling showed me that this is quite a troublesome and complicated task. Maybe this might work:
http://stackoverflow.com/questions/26500270/understanding-user-file-ownership-in-docker-how-to-avoid-changing-permissions-oYohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/42Human sequence filtering sensitivity2018-02-09T09:10:34+01:00Shaman NarayanasamyHuman sequence filtering sensitivityThe human genome sequence mapping using BWA is extremely sensitive. I realized this when testing on a simulated bacterial community. It filters out too many sequences even though the simulated sequences should not contain any human seque...The human genome sequence mapping using BWA is extremely sensitive. I realized this when testing on a simulated bacterial community. It filters out too many sequences even though the simulated sequences should not contain any human sequences.
Consider changing parameters of the filtering or use other program for this purpose.
Shaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/40INDEX_FASTA_FILE: must go in init2018-02-09T09:10:34+01:00Yohan Jaroszyohan.jarosz@uni.luINDEX_FASTA_FILE: must go in initYohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/37different processces can access db at the same time2018-02-09T09:10:34+01:00Yohan Jaroszyohan.jarosz@uni.ludifferent processces can access db at the same timeWe should separate the initialisation of the databases from snakemake as if you run multiple jobs in parallele, one can simply re-generate the databases while another is using it. Could leads to lot of troubles...We should separate the initialisation of the databases from snakemake as if you run multiple jobs in parallele, one can simply re-generate the databases while another is using it. Could leads to lot of troubles...Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/35At initializing, IMP download some databases and tools over the network. Some...2018-02-09T09:10:34+01:00Yohan Jaroszyohan.jarosz@uni.luAt initializing, IMP download some databases and tools over the network. Sometimes it fails.It already occurs with prokka as the server is located in Australia.
Also with htqc, authors have removed the version we used from downloading so we had to update.
For the sake of reproducibility, external db or tools should be mirrore...It already occurs with prokka as the server is located in Australia.
Also with htqc, authors have removed the version we used from downloading so we had to update.
For the sake of reproducibility, external db or tools should be mirrored when it is possibleYohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/34By default, human filetring si set on chr212018-02-09T09:10:34+01:00Yohan Jaroszyohan.jarosz@uni.luBy default, human filetring si set on chr21Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/32IMP on bigbug2018-02-09T09:10:34+01:00Shaman NarayanasamyIMP on bigbugImplement IMP on bigbug and run on real data.Implement IMP on bigbug and run on real data.Shaman NarayanasamyShaman Narayanasamy