README.md 5.24 KB
Newer Older
Laura Denies's avatar
Laura Denies committed
1
# PathoFact v1.0 
Laura Denies's avatar
Laura Denies committed
2
3
4
5
6
7

PathoFact is an easy-to-use modular pipeline for the metagenomic analyses of toxins, virulence factors and antimicrobial resistance. 
Additionally, PathoFact combines the prediction of these pathogenic factors with the identification of mobile genetic elements. 
This provides further depth to the analysis by considering the localization of the genes on mobile genetic elements (MGEs), as well as on the chromosome. 
Furthermore, each module (toxins, virulence factors, and antimicrobial resistance) of PathoFact is also a standalone component, making it a flexible and versatile tool. 

Laura Denies's avatar
Laura Denies committed
8
For further information regarding usage and generated reports, please see the [Documentation](https://git-r3lab.uni.lu/laura.denies/PathoFact/-/wikis/home)
Laura Denies's avatar
Laura Denies committed
9

Valentina Galata's avatar
Valentina Galata committed
10
# Requirements and installation
Laura Denies's avatar
Laura Denies committed
11

Valentina Galata's avatar
Valentina Galata committed
12
The main requirements are:
Laura Denies's avatar
Laura Denies committed
13
- `gcc/g++`
Valentina Galata's avatar
Valentina Galata committed
14
- `git` and `git lfs`
Laura Denies's avatar
Laura Denies committed
15
- `conda (version 4.9.2)`
Laura Denies's avatar
Laura Denies committed
16

Valentina Galata's avatar
Valentina Galata committed
17
18
Most other dependencies are either included as a `git submodule` or will be installed automatically by `snakemake` using the `conda` YAML files in `envs/`.
However, some tools need to be installed and/or configured manually.
Laura Denies's avatar
Laura Denies committed
19

Laura Denies's avatar
Laura Denies committed
20

Valentina Galata's avatar
Valentina Galata committed
21
## Miniconda (conda)
Laura Denies's avatar
Laura Denies committed
22

Valentina Galata's avatar
Valentina Galata committed
23
24
```bash
# install miniconda3
Laura Denies's avatar
Laura Denies committed
25
26
27
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh
chmod u+x Miniconda3-py37_4.9.2-Linux-x86_64.sh
./Miniconda3-py37_4.9.2-Linux-x86_64.sh # follow the instructions
Valentina Galata's avatar
Valentina Galata committed
28
```
Laura Denies's avatar
Laura Denies committed
29

Valentina Galata's avatar
Valentina Galata committed
30
31
32
33
34
35
36
37
## PathoFact

Clone the repository and its sub-modules:

```bash
# activate git lfs
git lfs install
# clone branch incl. sub-modules
Laura Denies's avatar
Laura Denies committed
38
git clone -b master --recursive https://git-r3lab.uni.lu/laura.denies/PathoFact.git
Valentina Galata's avatar
Valentina Galata committed
39
40
```

Laura Denies's avatar
Laura Denies committed
41
42
43
44
45
46
47
## Pipeline environment

```bash
# create the conda environment
conda env create -f=envs/PathoFact.yaml
```

Laura Denies's avatar
Laura Denies committed
48
You can activate and deactivate the environment using `conda activate PathoFact` and `conda deactivate`, respectively. PathoFact requires a `snakemake version >= 5.5.4`.
Laura Denies's avatar
Laura Denies committed
49

Valentina Galata's avatar
Valentina Galata committed
50
## Dependency: `SignalP`
Laura Denies's avatar
Laura Denies committed
51

Laura Denies's avatar
Laura Denies committed
52
Required version: `5.0`
Laura Denies's avatar
Laura Denies committed
53

Valentina Galata's avatar
Valentina Galata committed
54
To download the tool you need to submit a request at https://services.healthtech.dtu.dk/:
Laura Denies's avatar
Laura Denies committed
55

Valentina Galata's avatar
Valentina Galata committed
56
57
- Look for "SignalP" and click on the link
- Click on "Downloads"
Laura Denies's avatar
Laura Denies committed
58
- Click on the link for your platform for the version "Version 5.0g"
Valentina Galata's avatar
Valentina Galata committed
59
- Fill and submit the form
Laura Denies's avatar
Laura Denies committed
60

Laura Denies's avatar
Laura Denies committed
61
After the installation, adjust the path for this tool in `config.yaml` and `test/test_config.yaml` (keyword `signalp`).
Valentina Galata's avatar
Valentina Galata committed
62
63
64
65
66

# Usage

## Input files

Laura Denies's avatar
Laura Denies committed
67
Each sample should have one input file:
Valentina Galata's avatar
Valentina Galata committed
68
69

- `*.fna`: FASTA file containing nucleotide sequences of the contigs
Laura Denies's avatar
Laura Denies committed
70
    - no whitespaces in FASTA headers (i.e. to remove whitespaces: sed -i '/^>/ s/ .*//' file.fna)
Laura Denies's avatar
Laura Denies committed
71
72
73
74
    - for prediction of mobile genetic elements and input for prodigal

The following files are generated by PathoFact itself:

Valentina Galata's avatar
Valentina Galata committed
75
76
77
78
79
80
81
- `*.faa`: FASTA file conatining translated gene sequences, i.e. amino acid sequences
    - no whitespaces in FASTA headers
    - for prediction of toxins, virulence factors and antimicrobial resistance genes
- `*.contig`: TAB-delimited file containing a mapping from contig ID (1st column) to gene ID (2nd column)
    - no header, one gene ID per line
    - contig and gene IDs should be the same as in the FASTA files

Laura Denies's avatar
Laura Denies committed
82
83
The input file for each sample should be located in the same directory.
For each sample, the corresponding input files should have the same basename, e.g. `SAMPLE_A.fna` for sample `SAMPLE_A`.
Valentina Galata's avatar
Valentina Galata committed
84

Laura Denies's avatar
Laura Denies committed
85
86
**NOTE**: For preprocessing and assembly of metagenomic reads we would suggest using IMP (https://imp.pages.uni.lu/web/)

Valentina Galata's avatar
Valentina Galata committed
87
## Run PathoFact
Laura Denies's avatar
Laura Denies committed
88

Valentina Galata's avatar
Valentina Galata committed
89
### Configuration
Laura Denies's avatar
Laura Denies committed
90

Valentina Galata's avatar
Valentina Galata committed
91
To run PathoFact you need to adjust some parameters in `config.yaml`.
Laura Denies's avatar
Laura Denies committed
92

93
94
95
- `sample`: This is a list of sample names, e.g. `sample: ["SAMPLE_A","SAMPLE_B"]`
- `project`: A unique project name which will be used as the name of the output directory in `datapath` path (see below).
- `datapath`: Path to directory containing the sample data; the output directory will be created there.
Valentina Galata's avatar
Valentina Galata committed
96
97
98
99
100
- `workflow`: Pathofact can run the complete pipeline (default) or a specific step:
    - "complete": complete pipeline = toxin + virulence + AMR + MGE prediction
    - "Tox": toxin prediction
    - "Vir": virulence prediction
    - "AMR": antimicrobial resistance (AMR) & mobile genetic elements (MGEs) prediction
Laura Denies's avatar
Laura Denies committed
101

Valentina Galata's avatar
Valentina Galata committed
102
### Execution
Laura Denies's avatar
Laura Denies committed
103

Valentina Galata's avatar
Valentina Galata committed
104
Basic command to run the pipeline using `<cores>` CPUs:
Laura Denies's avatar
Laura Denies committed
105

Valentina Galata's avatar
Valentina Galata committed
106
107
108
109
110
111
112
```bash
# activate the env
conda activate PathoFact
# run the pipeline
# set <cores> to the number of cores to use, e.g. 10
snakemake -s Snakefile --use-conda --reason --cores <cores> -p 
```
Laura Denies's avatar
Laura Denies committed
113

Valentina Galata's avatar
Valentina Galata committed
114
**NOTE**: Add parameter `-n` (or `--dry-run`) to the command to see which steps will be executed without running them.
Laura Denies's avatar
Laura Denies committed
115

Valentina Galata's avatar
Valentina Galata committed
116
**NOTE**: Add `--configfile <configfile.yaml>` to use a different config file than `config.yaml`. 
Laura Denies's avatar
Laura Denies committed
117

Valentina Galata's avatar
Valentina Galata committed
118
**NOTE**: It is advised to run the pipeline using multiple CPUs or CPUs with "higher" memory.
Laura Denies's avatar
Laura Denies committed
119

Valentina Galata's avatar
Valentina Galata committed
120
For more options, see the [snakemake documentation](https://snakemake.readthedocs.io/en/stable/index.html).
121
122
123
124
125
126
127
128
129

### Execution on a cluster

The pipeline can be run on a cluster using `slurm`.
The command can be found in the script `cluster.sh` which can also be used to submit the jobs to the cluster.

```bash
sbatch cluster.sh
```
Laura Denies's avatar
Laura Denies committed
130
131
132
133
134
135
136
137
138
139
140
### Test module

To test for the correct installation of the pipeline the testmodule can be run:

Include the required path to the SingalP v5.0 installation to the config file `test/test_config.yaml`

```
# activate env
conda activate PathoFact
# run the pipeline
snakemake -s test/Snakefile --use-conda --reason --cores 1 -p
Laura Denies's avatar
Laura Denies committed
141
```