IMP issueshttps://git-r3lab.uni.lu/IMP/IMP/-/issues2019-08-27T08:50:22+02:00https://git-r3lab.uni.lu/IMP/IMP/-/issues/91Custom screening does not work well for launching in batches2019-08-27T08:50:22+02:00Shaman NarayanasamyCustom screening does not work well for launching in batchesWhen launching several instances of IMP, in parallel, the screen parameter seems to be indexing the fasta file each time IMP is launched. This is not an issue in a normal scenario, but in parallel, all the IMP instances are clashing and ...When launching several instances of IMP, in parallel, the screen parameter seems to be indexing the fasta file each time IMP is launched. This is not an issue in a normal scenario, but in parallel, all the IMP instances are clashing and trying to index the same file, which makes corrupts the process, which is then terminated. Not sure why Snakemake doesn't recognize the existence of the bwa index files that were indexed previously. In addition, if a given screen file is very big, we would also ideally want to index it only once. For example, the human genome takes two hours to index...
-Shaman-
Edit:
So, looks like we figured out why this is happening. The `--screen` parameter first copies the relevant fasta file into the `~/database/filtering` (or `db/filtering`) folder. This causes the indexed files (from the previous run) to have a later time stamp compared to the copied fasta file. Therefore `Snakemake` invokes the indexing parameter again. @yjarosz, any idea what is the best way to solve this. I was thinking that we could give some conditions (`bash`) within the rules to deal with it. Let me know what you think.Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/64Tracking outreach2018-02-09T09:10:34+01:00Cedric LacznyTracking outreachHi,
how about using http://www.altmetric.com/products/altmetric-badges/ in order to create a simple outreach-tracker and put it on the IMP website, e.g.:
```
<div data-badge-popover="right" data-badge-type="medium-donut" data-doi="1...Hi,
how about using http://www.altmetric.com/products/altmetric-badges/ in order to create a simple outreach-tracker and put it on the IMP website, e.g.:
```
<div data-badge-popover="right" data-badge-type="medium-donut" data-doi="10.1101/039263 " data-hide-no-mentions="true" class="altmetric-embed"></div>
```
That's how the Altmetrics of the IMP preprint currently look like:
http://www.altmetric.com/details/5304482
Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/63R script reading of GFF files2018-02-09T09:10:34+01:00Shaman NarayanasamyR script reading of GFF filesIn certain data sets, the error below occurs.
```
[1] "Read in gff3 annotation file"
Read 254171 records
Error in validObject(.Object) :
invalid class "Genome_intervals_stranded" object: The 'annotation' slot should have a colu...In certain data sets, the error below occurs.
```
[1] "Read in gff3 annotation file"
Read 254171 records
Error in validObject(.Object) :
invalid class "Genome_intervals_stranded" object: The 'annotation' slot should have a column named 'inter_base' that is logical and does not contain missing values.
Calls: readGff3 ... eval -> eval -> .nextMethod -> initMatrix -> validObject
In addition: Warning message:
In readGff3(annot_file, isRightOpen = TRUE) :
'readGff3' has changed to closed interval conventions!
Use 'isRightOpen=TRUE' to restore the previous behavior
that allowed for zero-length features. Alternatively, use
the readZeroLengthFeaturesGff3 function instead.
You can turn off this warning by setting 'quiet=TRUE'
Execution halted
```
Need to handle these cases appropriately so that the pipeline doesn't break for this error.Shaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/60IMP execution outside of code directory2018-02-09T09:10:34+01:00Shaman NarayanasamyIMP execution outside of code directoryIMP is currently not executable outside of the directory containing the code. Or I am doing it the wrong way. This is an issue because it is not efficient. At present, everyone running IMP has their own copy or the repository, the databa...IMP is currently not executable outside of the directory containing the code. Or I am doing it the wrong way. This is an issue because it is not efficient. At present, everyone running IMP has their own copy or the repository, the databases etc. Is it possible to have a centralized repository which everyone can run from?
Below is the error I get when trying to execute outide the IMP directory.
```
fatal: Not a git repository (or any parent up to mount point /mnt/md1200)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Traceback (most recent call last):
File "/home/snarayanasamy/Work/tools/IMP/IMP", line 252, in <module>
args = docopt(__doc__, version=get_git_version(), options_first=True)
File "/home/snarayanasamy/Work/tools/IMP/IMP", line 122, in get_git_version
['git', '--no-pager', 'log', '-n', '1', '--pretty=format:%H']
File "/usr/local/lib/python3.4/subprocess.py", line 620, in check_output
raise CalledProcessError(retcode, process.args, output=output)
subprocess.CalledProcessError: Command '['git', '--no-pager', 'log', '-n', '1', '--pretty=format:%H']' returned non-zero exit status 128
```
Is there any way we can fix this? My IMP code directory is getting so messy with all the log files etc :six:
Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/58Work on snakemake step names2018-02-09T09:10:34+01:00Shaman NarayanasamyWork on snakemake step namesChange names to be more meaningful and understandable.
NOTE: Remember to also change log file names
Change names to be more meaningful and understandable.
NOTE: Remember to also change log file names
Shaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/56Update MEGAHIT2018-02-09T09:10:34+01:00Shaman NarayanasamyUpdate MEGAHITUpdate to latest version that accommodates paired end information.
Update corresponding command for MEGAHIT within snakemake rules.Update to latest version that accommodates paired end information.
Update corresponding command for MEGAHIT within snakemake rules.Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/49R migrate functions2018-02-09T09:10:34+01:00Shaman NarayanasamyR migrate functionsClean up R functions from the plot script by moving to a function script.
Make sure handling of NA's are optimal!!Clean up R functions from the plot script by moving to a function script.
Make sure handling of NA's are optimal!!Shaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/48MG detuplication step not appearing in snakemake stdout.2018-02-09T09:10:34+01:00Shaman NarayanasamyMG detuplication step not appearing in snakemake stdout.Files generated, but there seems to be not stdout for that step...
Looks like it does appear in the log, but it appears after the annotation rule... I wonder if the analysis went fine, if this was the case...
This behaviour seems c...Files generated, but there seems to be not stdout for that step...
Looks like it does appear in the log, but it appears after the annotation rule... I wonder if the analysis went fine, if this was the case...
This behaviour seems consistent across all analyses samples. Wonder if the analysis itself is fine...
```
rule ANALYSIS_ANNOTATE:
input: /output/Assembly/MGMT.assembly.merged.fa, /databases/cm/Bacteria.i1i, /databases/genus/Staphylococcus.phr, /databases/hmm/CLUSTERS.hmm.h3f, /databases/kingdom/Archaea/sprot.phr
output: /output/Analysis/annotation/annotation.filt.gff
log: /output/Analysis/Analysis.log
benchmark: /output/Analysis/benchmarks/ANALYSIS_ANNOTATE.json
Softlinking /usr/bin/../db to /databases
18 of 40 steps (45%) done
rule PREPROCESSING_MG_DEDUPLICATE:
input: /output/Preprocessing/MG.R1.fq, /output/Preprocessing/MG.R2.fq
output: /output/Preprocessing/MG.R1.uniq.fq, /output/Preprocessing/MG.R2.uniq.fq
log: /output/Preprocessing/Preprocessing.log
benchmark: /output/Preprocessing/benchmarks/PREPROCESSING_MG_DEDUPLICATE.json
19 of 40 steps (48%) done
rule ANALYSIS_MG_READ_COUNT:
input: /output/Preprocessing/MG.R1.fq, /output/Preprocessing/MG.R2.fq, /output/Preprocessing/MG.R1.uniq.fq, /output/Preprocessing/MG.R2.uniq.fq, /output/Preprocessing/MG.R1.uniq.trimmed.fq, /output/Preprocessing/MG.R2.uniq.trimmed.fq, /output/Preprocessing/MG.SE.uniq.trimmed.fq, /output/Preprocessing/MG.R1.uniq.trimmed.hg38.fq, /output/Preprocessing/MG.R2.uniq.trimmed.hg38.fq, /output/Preprocessing/MG.SE.uniq.trimmed.hg38.fq
output: /output/Analysis/MG.read_counts.txt
log: /output/Analysis/Analysis.log
benchmark: /output/Analysis/benchmarks/ANALYSIS_MG_READ_COUNT.json
```Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/45Changing ownership of files within docker2018-02-09T09:10:34+01:00Shaman NarayanasamyChanging ownership of files within dockerIs it possible to change ownership of files within docker? For instance, create a rule that converts the directory and file ownership to the user that launched the job.
I realized that we are having issues moving the files because we ...Is it possible to change ownership of files within docker? For instance, create a rule that converts the directory and file ownership to the user that launched the job.
I realized that we are having issues moving the files because we do not own them. This is rather inconvenient as the user needs to be root in order to change the file. It is not a problem at present for bigbug users as there are only two people actively running IMP, but will be in the future...
Thanks @anne.kaysen for bringing this to light.
Update: Some googling showed me that this is quite a troublesome and complicated task. Maybe this might work:
http://stackoverflow.com/questions/26500270/understanding-user-file-ownership-in-docker-how-to-avoid-changing-permissions-oYohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/42Human sequence filtering sensitivity2018-02-09T09:10:34+01:00Shaman NarayanasamyHuman sequence filtering sensitivityThe human genome sequence mapping using BWA is extremely sensitive. I realized this when testing on a simulated bacterial community. It filters out too many sequences even though the simulated sequences should not contain any human seque...The human genome sequence mapping using BWA is extremely sensitive. I realized this when testing on a simulated bacterial community. It filters out too many sequences even though the simulated sequences should not contain any human sequences.
Consider changing parameters of the filtering or use other program for this purpose.
Shaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/40INDEX_FASTA_FILE: must go in init2018-02-09T09:10:34+01:00Yohan Jaroszyohan.jarosz@uni.luINDEX_FASTA_FILE: must go in initYohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/37different processces can access db at the same time2018-02-09T09:10:34+01:00Yohan Jaroszyohan.jarosz@uni.ludifferent processces can access db at the same timeWe should separate the initialisation of the databases from snakemake as if you run multiple jobs in parallele, one can simply re-generate the databases while another is using it. Could leads to lot of troubles...We should separate the initialisation of the databases from snakemake as if you run multiple jobs in parallele, one can simply re-generate the databases while another is using it. Could leads to lot of troubles...Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/35At initializing, IMP download some databases and tools over the network. Some...2018-02-09T09:10:34+01:00Yohan Jaroszyohan.jarosz@uni.luAt initializing, IMP download some databases and tools over the network. Sometimes it fails.It already occurs with prokka as the server is located in Australia.
Also with htqc, authors have removed the version we used from downloading so we had to update.
For the sake of reproducibility, external db or tools should be mirrore...It already occurs with prokka as the server is located in Australia.
Also with htqc, authors have removed the version we used from downloading so we had to update.
For the sake of reproducibility, external db or tools should be mirrored when it is possibleYohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/34By default, human filetring si set on chr212018-02-09T09:10:34+01:00Yohan Jaroszyohan.jarosz@uni.luBy default, human filetring si set on chr21Yohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/32IMP on bigbug2018-02-09T09:10:34+01:00Shaman NarayanasamyIMP on bigbugImplement IMP on bigbug and run on real data.Implement IMP on bigbug and run on real data.Shaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/30Prokka setup db2018-02-09T09:10:34+01:00Yohan Jaroszyohan.jarosz@uni.luProkka setup dbAdd a rule to symlink db directory each time if the symlink does not existAdd a rule to symlink db directory each time if the symlink does not existYohan Jaroszyohan.jarosz@uni.luYohan Jaroszyohan.jarosz@uni.luhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/25Handling compressed files2018-02-09T09:10:34+01:00Shaman NarayanasamyHandling compressed filesIMP needs to handle various compression formats that it might encounter. I am currently testing for:
gzip
bgzip
bzip2
tar.gzIMP needs to handle various compression formats that it might encounter. I am currently testing for:
gzip
bgzip
bzip2
tar.gzShaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/18Variant calling not working2018-02-09T09:10:34+01:00Shaman NarayanasamyVariant calling not workingAfter 8cea2718
Don't exactly know the issue. Looking into it.After 8cea2718
Don't exactly know the issue. Looking into it.Shaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/5Variant call results2018-02-09T09:10:34+01:00Shaman NarayanasamyVariant call resultsCheck variant calling final output and make sure it was merged.
Check variant calling final output and make sure it was merged.
Shaman NarayanasamyShaman Narayanasamyhttps://git-r3lab.uni.lu/IMP/IMP/-/issues/4Issue with downloading ec to kegg mappings2018-02-09T09:10:34+01:00Shaman NarayanasamyIssue with downloading ec to kegg mappingsProblem with `src/make.ec.to.pwy.kegg.py` script output. Need to check script.Problem with `src/make.ec.to.pwy.kegg.py` script output. Need to check script.Shaman NarayanasamyShaman Narayanasamy