ONT_pilot_gitlab issueshttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues2020-12-08T13:48:38+01:00https://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/96Workflow: Reference-based analysis (Zymo)2020-12-08T13:48:38+01:00Valentina Galatavalentina.galata@uni.luWorkflow: Reference-based analysis (Zymo)Create a separate workflow, i.e. folder/structure, for the reference-based analysis of the `Zymo` dataset.Create a separate workflow, i.e. folder/structure, for the reference-based analysis of the `Zymo` dataset.The bright futurehttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/100Rewriting of slurm.yaml files to match new UL HPC SLURM configurations2020-10-28T10:19:42+01:00Cedric LacznyRewriting of slurm.yaml files to match new UL HPC SLURM configurationsThe master document, in some way: https://hpc.uni.lu/download/slides/2020-ULHPC-user-guide.pdf
The `partition`s can stay as they are, but the `qos` need to be adjusted accordingly :disappointed:
The rules with a `walltime` > `02-00:00:...The master document, in some way: https://hpc.uni.lu/download/slides/2020-ULHPC-user-guide.pdf
The `partition`s can stay as they are, but the `qos` need to be adjusted accordingly :disappointed:
The rules with a `walltime` > `02-00:00:00` will need to be moved to `qos: long`.
What is yet a bit unclear to me, is how many `long` jobs can be launched per user.
According to the master document, it would be only a **single** one :scream:
A quick look revealed `assembly_lr_canu` to be the only affected rule - at least in the `master` branch.Post-Slurm2.0 cleanupSusheel BusiValentina Galatavalentina.galata@uni.luSusheel Busihttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/97Analysis: metaP2021-03-25T12:10:04+01:00Valentina Galatavalentina.galata@uni.luAnalysis: metaPCode required for `metaP` analysisCode required for `metaP` analysisStretch goalhttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/74Validation of main config file2020-07-31T11:18:59+02:00Valentina Galatavalentina.galata@uni.luValidation of main config fileAdd a config schema file and validation stepAdd a config schema file and validation stepClean-upValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/50RM partial gene counts from analysis of the main workflow2020-07-02T12:58:45+02:00Valentina Galatavalentina.galata@uni.luRM partial gene counts from analysis of the main workflowRemove rule `analysis_prodigal_partial` from `workflow/step/analysis.smk` and `workflow/rules/analysis.smk`.
Number of total/partial genes should be computed in `workflow_report`.
* [x] remove from `workflow`
* [x] add to `workflow_report`Remove rule `analysis_prodigal_partial` from `workflow/step/analysis.smk` and `workflow/rules/analysis.smk`.
Number of total/partial genes should be computed in `workflow_report`.
* [x] remove from `workflow`
* [x] add to `workflow_report`Clean-upValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/46QUAST: TSV instead of TXT as output2020-07-01T10:00:57+02:00Valentina Galatavalentina.galata@uni.luQUAST: TSV instead of TXT as outputChange the output file to `report.tsv` instead of `report.txt`: easier to parse laterChange the output file to `report.tsv` instead of `report.txt`: easier to parse laterClean-upValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/45Execution on different datasets2020-07-01T10:05:34+02:00Valentina Galatavalentina.galata@uni.luExecution on different datasets- check if there is no metaT data
- separate config files for each dataset in `config/`- check if there is no metaT data
- separate config files for each dataset in `config/`Clean-upValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/35Link contigs between assemblies2020-11-03T12:13:38+01:00Susheel BusiLink contigs between assemblies- Verification of the code and possible improvement maybe.
- See commit# 73dd1bc8 [prepare_plot_files_w_metaspades.sh](https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/blob/73dd1bc85e21eb93e818a14a65ce9fa0cfe5de4e/2019_GDB/scrip...- Verification of the code and possible improvement maybe.
- See commit# 73dd1bc8 [prepare_plot_files_w_metaspades.sh](https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/blob/73dd1bc85e21eb93e818a14a65ce9fa0cfe5de4e/2019_GDB/scripts/scripts/prepare_plot_files_w_metaspades.sh)The bright futureSusheel BusiValentina Galatavalentina.galata@uni.luSusheel Busihttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/32Sub-workflow: Use main pipeline in figure pipeline2020-08-20T07:46:58+02:00Valentina Galatavalentina.galata@uni.luSub-workflow: Use main pipeline in figure pipelineAdd the main pipeline as a sub-workflow in the figure `snakemake` file: use it for files containing data required to create figures/tables.Add the main pipeline as a sub-workflow in the figure `snakemake` file: use it for files containing data required to create figures/tables.Manuscript - initial versionValentina Galatavalentina.galata@uni.luValentina Galatavalentina.galata@uni.luhttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/27Clean-up: data for figures2020-07-02T13:55:50+02:00Valentina Galatavalentina.galata@uni.luClean-up: data for figuresIdeally, all data files required for figures should be created by a `snakemake` file and should not be stored in the repository.
Currently, in branch `figures_valentina`, `figures/data/` contains various files. These should be created b...Ideally, all data files required for figures should be created by a `snakemake` file and should not be stored in the repository.
Currently, in branch `figures_valentina`, `figures/data/` contains various files. These should be created by a rule in either `figures/figures.smk` or other `snakemake` file outside of the `figures/` folder.Manuscript - initial versionhttps://git-r3lab.uni.lu/ESB/ont_pilot_gitlab/-/issues/23Analysis/Figure: mappability index2020-07-02T09:14:35+02:00Susheel BusiAnalysis/Figure: mappability index- Rather than putting the following in the "analysis" snakefile, can we code this into the Rscript when making the figures?
```bash
# using the mappability_index files. example: `hybrid_mmi_merged.txt` file in the `results/mapping` fold...- Rather than putting the following in the "analysis" snakefile, can we code this into the Rscript when making the figures?
```bash
# using the mappability_index files. example: `hybrid_mmi_merged.txt` file in the `results/mapping` folder, which will be created using the snakefile I'm making.
for file in *.txt; do echo $file; egrep 'total | mapped' $file | head -n 2; done | paste - - - | \
sed $'1 i\\\nsample\ttotal\tmapped\tpercent_mapped' | sed 's/ + 0 in total (QC-passed reads + QC-failed reads)//g' | \
sed s'/ + 0 mapped//g' | sed 's/ : N\/A)//g' | sed s'/(//g' | sed s'/.txt//g' | \
awk '{print$0=$1"\t"$2"\t"$3"\t"$4}' > mappability_index.tsv
```
And the same for the counts where contigs smaller than 1Kb - 5kb are removed. examples below
```bash
for file in *counts.txt
do
echo "$file"
awk '$2 >= 1000 {sum += $3} END {print sum}' "$file" # sum of number of reads (3rd column) if 2nd column is greater/equal to 1000
awk '{sum+=$3+$4} END {print sum}' "$file" # sum of 3rd and 4th columns to get total reads
done | paste - - - | awk '$4=100*$2/$3' | \ # getting the percentage from columns 2 and 3 and writing to column-4
sed $'1 i\\\nsample\tmapped_reads\ttotal\tpercent_mapped' > mappability_over_1000bp.txt
# collating the sums for contigs larger than 1500 bp
for file in *counts.txt
do
echo "$file"
awk '$2 >= 1500 {sum += $3} END {print sum}' "$file"
awk '{sum+=$3+$4} END {print sum}' "$file"
done | paste - - - | awk '$4=100*$2/$3' | \
sed $'1 i\\\nsample\tmapped_reads\ttotal\tpercent_mapped' > mappability_over_1500bp.txt
```
@valentina.galata @cedric.laczny what do you think? or should I try to code this into the snakefile? The "headache" is that the names of the *BAM* files are super long and different. Could create symlinks like I did in the past, but want to avoid creating more than we need to.Manuscript - initial versionSusheel BusiValentina Galatavalentina.galata@uni.luSusheel Busi