Commit 031fb8cf authored by Susana MARTINEZ's avatar Susana MARTINEZ
Browse files

update

parent 4f0b55ce
## 3- **MgeDereplication workflow**
* **Inputs**: all phage and plasmid predictions
* **Steps and outputs**:
- collection of the lists of phages and plasmid predictions, and the PSpCCs
- fetch the sequences
- run CD-HIT (parameteres within rule collapse\_candidate\_mobile\_elements.rule)
- generate table of MGE info (contig name, type, etc.)
* **Launcher** (adjust memory request, input paths, etc.)
```
#!/bin/bash -l
#OAR -n mgeDereplication
#OAR -l nodes=1/core=8,walltime=120
source /home/users/smartinezarbas/git/gitlab/CRISPR_MGE_pipeline/src/preload_modules.sh
export THREADS='8'
export TS_DIR='/work/users/smartinezarbas/comparative_analysis/AmazonRiver/IMP_results'
export DB_FA_DIR='/work/users/smartinezarbas/comparative_analysis/AmazonRiver/IMP_results/Assemblies'
export TS_SAMPLES='TS2 TD2'
export MGE_OUTDIR='/scratch/users/smartinezarbas/AmazonRiverCRISPR_MGE/MGE_prediction'
snakemake -j 8 -pf mge_dereplication_workflow.done -s workflows/MgePrediction
```
\ No newline at end of file
## 5- **MgeHostLink**
* **Inputs**: pieces of all the previous workflows + output of binning dereplication (to link a bin with a CRISPR, it is needed to at least have contigs assigned to bins)
* **Steps**
- blast of bins against CRISPR flanks (from workflow **CrisprPrediction**)
- blast of bins against CRISPR repeats (from workflow **CrisprPrediction**)
- identification of hosts: filter by matches with flank and repeat sequences, and filtering by coverage and identity
- link CRISPR spacers to protospacers (formatting and adding info to protospacers identified in workflow **CrisprPrediction**)
- link spacers with candidate hosts
- link hosts with targeted protospacers
* **Launcher**
```
#!/bin/bash -l
#OAR -n mgeHostLink_test
#OAR -l nodes=1/core=1,walltime=4
source /home/users/smartinezarbas/git/gitlab/CRISPR_MGE_pipeline/src/preload_modules.sh
export THREADS='3'
export TS_DIR='/work/users/smartinezarbas/comparative_analysis/AmazonRiver/IMP_results'
export TS_SAMPLES='TS2 TD2'
export DB_FA_DIR='/work/users/smartinezarbas/comparative_analysis/AmazonRiver/IMP_results/Assemblies'
export BIN_DICT='metadata/bin_conversion.tsv'
export OUTDIR='/scratch/users/smartinezarbas/AmazonRiverCRISPR_MGE/MGE_CrispHost_link'
snakemake -npf mge_host_link_workflow.done -s workflows/MgeHostLink
```
\ No newline at end of file
## 4- **MgeRemapping**
* **Inputs**: preprocessed MG and MT reads, list of unique MGEs
* **Steps**:
- annotation
- index
- mapping reads to the mge contigs
- featureCounts: gene and contig levels
- calculate abundance: average depth of coverage per contig of MG and MT
* **Launcher**
```
#!/bin/bash -l
#OAR -n mgeRemapping_test
#OAR -l nodes=1/core=12,walltime=96
source /home/users/smartinezarbas/git/gitlab/CRISPR_MGE_pipeline/src/preload_modules.sh
export THREADS='3'
export TS_DIR='/work/users/smartinezarbas/comparative_analysis/AmazonRiver/IMP_results'
export TS_SAMPLES='TS2 TD2'
export IGE_DIR='/scratch/users/smartinezarbas/AmazonRiverCRISPR_MGE/MGE_dereplication'
export OUTDIR='/scratch/users/smartinezarbas/AmazonRiverCRISPR_MGE/MGE_remapping'
snakemake -j 4 -pfk mge_remapping_workflow.done -s workflows/MgeRemapping
```
\ No newline at end of file
......@@ -7,7 +7,7 @@ The CRISPR-MGE pipeline identifies the CRISPRs from reads and contigs, and invas
* [iMGEs prediction](CRISPR-prediction.md)
* [iMGEs dereplication](MGE-dereplication.md): collection of predicted MGEs and redundancy removal.
* [iMGEs remapping](MGE-remapping.md): remapping of all the metagenomic and metatranscriptomic reads to the iMGE sequences.
* **MgeHostLink**: identification of candidate hosts, their spacers composition and the link with the protospacer-containing contigs.
* [iMGE-Hosts CRISPR-mediated links](MGE-host-link.md): identification of candidate hosts, their spacers composition and the link with the protospacer-containing contigs.
## Dependencies
- [CRASS](http://ctskennerton.github.io/crass/)
......@@ -23,51 +23,6 @@ The CRISPR-MGE pipeline identifies the CRISPRs from reads and contigs, and invas
- R version 3.4.0: packages `tidyverse`, `ggplot2`, `reshape2`, (...)
## 2- **MgePrediction workflow**
* **Inputs**: from IMP results; MGMT co-assembled contigs, MT contigs
* **Steps and outputs**:
- prediction of **phages** by [VirSorter](https://github.com/simroux/VirSorter) and [VirFinder](https://github.com/jessieren/VirFinder)
- prediction of **plasmids** by [cBar](http://csbl.bmb.uga.edu/~ffzhou/cBar/) and [PlasFlow](https://github.com/smaegol/PlasFlow)
* **Launcher** (adjust memory requests, input paths, etc.)
```
#!/bin/bash -l
#OAR -n crisprPrediction
#OAR -l nodes=1/core=8,walltime=120
source /home/users/smartinezarbas/git/gitlab/CRISPR_MGE_pipeline/src/preload_modules.sh
export THREADS='8'
export TS_DIR='/work/users/smartinezarbas/comparative_analysis/AmazonRiver/IMP_results'
export DB_FA_DIR='/work/users/smartinezarbas/comparative_analysis/AmazonRiver/IMP_results/Assemblies'
export TS_SAMPLES='TS2 TD2'
export MGE_OUTDIR='/scratch/users/smartinezarbas/AmazonRiverCRISPR_MGE/MGE_prediction'
snakemake -j 8 -pf plasmid_phage_prediction_workflow.done -s workflows/MgePrediction
```
## 3- **MgeDereplication workflow**
* **Inputs**: all phage and plasmid predictions
* **Steps and outputs**:
- collection of the lists of phages and plasmid predictions, and the PSpCCs
- fetch the sequences
- run CD-HIT (parameteres within rule collapse\_candidate\_mobile\_elements.rule)
- generate table of MGE info (contig name, type, etc.)
* **Launcher** (adjust memory request, input paths, etc.)
```
#!/bin/bash -l
#OAR -n mgeDereplication
#OAR -l nodes=1/core=8,walltime=120
source /home/users/smartinezarbas/git/gitlab/CRISPR_MGE_pipeline/src/preload_modules.sh
export THREADS='8'
export TS_DIR='/work/users/smartinezarbas/comparative_analysis/AmazonRiver/IMP_results'
export DB_FA_DIR='/work/users/smartinezarbas/comparative_analysis/AmazonRiver/IMP_results/Assemblies'
export TS_SAMPLES='TS2 TD2'
export MGE_OUTDIR='/scratch/users/smartinezarbas/AmazonRiverCRISPR_MGE/MGE_prediction'
snakemake -j 8 -pf mge_dereplication_workflow.done -s workflows/MgePrediction
```
## 4- **MgeRemapping**
* **Inputs**: preprocessed MG and MT reads, list of unique MGEs
* **Steps**:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment