... | ... | @@ -356,19 +356,54 @@ vi config/CONFIG.yaml |
|
|
# Also, had to adjust the rules/ASSEMBLY_ANNOTATION_RULES, and the workflows/assembly.smk files
|
|
|
# Adjusted the short-read "name" in the rules files from "NEB2_MG_S17" to "ONT3_MG_xx_Rashi_S11"
|
|
|
# Adjusted the config file to run the first line before going on to the second line and further
|
|
|
# Included the workflows as steps in the [CONFIG.YAML](url) file
|
|
|
# steps: "mmseq metaT mapping binning taxonomy"
|
|
|
steps: "assembly_annotation metaT"
|
|
|
steps: "mapping mmseq binning taxonomy"
|
|
|
# Included the workflows as steps in the [CONFIG.YAML](https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/blob/master/config/CONFIG.yamlurl) file
|
|
|
|
|
|
steps: "assembly_annotation metaT mapping mmseq binning taxonomy"
|
|
|
|
|
|
# Basecalling taking too long, so running separately
|
|
|
# created a "basecalling_snakefile" and an associated launcher script "run_basecalling_snakemake.sh"
|
|
|
cp src/snakemake_run_use_conda_FINAL.sh run_basecalling_snakemake.sh
|
|
|
./run_basecalling_snakemake.sh
|
|
|
|
|
|
##### Running non-methylation-aware-basecalled results #####
|
|
|
mv -vf results methylation_aware_results
|
|
|
mkdir results
|
|
|
cp -vrf methylation_aware_results/basecalled_NO_MOD results/basecalled
|
|
|
rm *.done
|
|
|
|
|
|
# re-running the whole workflow for the non-methylation aware basecalled results
|
|
|
./src/snakemake_run_use_conda_FINAL.sh
|
|
|
|
|
|
# issues with the "FLYE ASSEMBLY"
|
|
|
# Error: https://github.com/fenderglass/Flye/issues/61
|
|
|
# creating a new conda environment with the latest flye
|
|
|
conda create -n flye flye
|
|
|
conda activate flye
|
|
|
conda env export > envs/flye_v2_7.yaml
|
|
|
sed -i 's/=[^=]*//2g' envs/flye_v2_7.yaml
|
|
|
|
|
|
# adjusted "rules/ASSEMBLY_ANNOTATION_RULES" to look for the new flye conda env
|
|
|
# re-running with new 'flye' environment
|
|
|
./src/snakemake_run_use_conda_FINAL.sh
|
|
|
```
|
|
|
|
|
|
## Chapter IX - If it ain't broke, why fix it? Well, to beautify it!!
|
|
|
- 2019_GDB data does not have any barcodes, unlike 2018 data which was multiplexed
|
|
|
- So, it is essential to create a generic workflow that can be used with others that have barcodes
|
|
|
- To improve this, we created the "WORKING_checkpoint_snakefile"
|
|
|
- this includes the guppy basecalling, merging fastq(s) and creating "dummy" folders rules
|
|
|
- Added new 'rule' and 'workflow' to the respective folders as follows, by adding the updated 'checkpoint' rules
|
|
|
- files created are as below
|
|
|
1. [rules/checkpoint_ASSEMBLY_ANNOTATION_RULES](https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/blob/checkpoint_snakefile/2019_GDB/rules/checkpoint_ASSEMBLY_ANNOTATION_RULES)
|
|
|
2. [workflows/checkpoint_assembly_annotation.smk](https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab/-/blob/checkpoint_snakefile/2019_GDB/workflows/checkpoint_assembly_annotation.smk)
|
|
|
|
|
|
cp updated_SNAKEFILE checkpoint_SNAKEFILE
|
|
|
# edited the paths to the 'rules' and 'workflows' in the "checkpoint_SNAKEFILE", to include the above files
|
|
|
# test run the "checkpoint_SNAKEILF"
|
|
|
snakemake -np -s checkpoint_SNAKEFILE --use-conda # test-run seemed to work fine
|
|
|
```
|
|
|
|
|
|
## Chapter IX - The miscellaneous or nearly-forgotten side projects
|
|
|
## Chapter X - The miscellaneous or nearly-forgotten side projects
|
|
|
- Due to the multifaceted nature of the best, i.e. the modular workflow, we tested several aspects separately
|
|
|
- For example: since we used two mappers bwa-mem and minimap for the reads, we binned each sample separtely based on the mapper
|
|
|
- Additionally, we needed to merge bam files for the "hybrid"-binning, so we compared bins using [sourmash](https://github.com/dib-lab/sourmash) and compared assemblies using [quast](https://github.com/ablab/quast)
|
... | ... | |