... | ... | @@ -933,6 +933,31 @@ done | paste - - - | awk '$4=100*$2/$3' | \ |
|
|
sed $'1 i\\\nsample\tmapped_reads\ttotal\tpercent_mapped' > mappability_over_5000bp.txt
|
|
|
```
|
|
|
|
|
|
###############
|
|
|
# Troubleshooting - SNAKEMAKE wanting to run everything #
|
|
|
# DIAMOND - RE-RUN #
|
|
|
###############
|
|
|
- When trying to run a single step., i.e. "TEST_DIAMOND" from the "ASSEMBLY_ANNOTATION_RULES"
|
|
|
- SNAKEMAKE wanted to re-run everything including assemblies
|
|
|
- using the `-r` flag in a dry-run we saw that some files had a newer timestamp
|
|
|
## TODO: figure out why certain folders have a newer timestamp
|
|
|
- The "WORKAROUND" for this is presented below
|
|
|
```
|
|
|
# Looking at timestamp with "stat" and editing "timestamp" with touch
|
|
|
1. stat results/annotation/proteins/flye/lr/merged/no_barcode/assembly.faa
|
|
|
2. # saved the original fasta file
|
|
|
cp results/assembly/flye/lr/merged/no_barcode/assembly.fasta results/assembly/flye/lr/merged/no_barcode/assembly.fasta.orig
|
|
|
3. # replacing the timestamp
|
|
|
touch -d '14 May 2020 05:17' results/assembly/flye/lr/merged/no_barcode/assembly.fasta
|
|
|
4. # dry-run
|
|
|
snakemake -npr -s workflows/assembly_annotation.smk results/annotation/diamond/flye/lr/merged/no_barcode/assembly.tsv
|
|
|
|
|
|
# created a new bash script to RE-RUN "DIAMOND with the "nr_uniprot_trembl" database
|
|
|
./src/run_diamond_only.sh "results/annotation/diamond/metaspades_hybrid/lr_no_barcode-sr_ONT3_MG_xx_Rashi_S11/contigs.tsv \
|
|
|
results/annotation/diamond/megahit/ONT3_MG_xx_Rashi_S11/final.contigs.tsv \
|
|
|
results/annotation/diamond/flye/lr/merged/no_barcode/assembly.tsv"
|
|
|
```
|
|
|
|
|
|
##### Notes - for 2019_GDB data #####
|
|
|
1. Since **diamond** was inadvertently run on the "new_nr.dmnd (BLAST nucleotide database), the folder was renamed ```mv diamond diamond_new_nr```
|
|
|
|
... | ... | |