README.md 4.5 KB
Newer Older
Laura Denies's avatar
Laura Denies committed
1
# PathoFact 1.0
Laura Denies's avatar
Laura Denies committed
2 3 4 5 6 7 8 9 10 11 12 13

PathoFact is an easy-to-use modular pipeline for the metagenomic analyses of toxins, virulence factors and antimicrobial resistance. 
Additionally, PathoFact combines the prediction of these pathogenic factors with the identification of mobile genetic elements. 
This provides further depth to the analysis by considering the localization of the genes on mobile genetic elements (MGEs), as well as on the chromosome. 
Furthermore, each module (toxins, virulence factors, and antimicrobial resistance) of PathoFact is also a standalone component, making it a flexible and versatile tool. 

# Requirements

PathoFact requires a good working Python (3.6.4), snakemake (version 5.5.4) and (mini)conda installation.
If snakemake is not yet installed one could install this by using the provided conda file (snakemake.yaml)
To install run: conda env create -f snakemake.yaml

Laura Denies's avatar
Laura Denies committed
14 15 16
PathoFact provides the conda environments with the dependencies needed to run the incorporated tools.
Some of the tools itself, however, still need to be installed. 
The following tools need to be installed by the user itself and the path to the tools adjusted within the config.yaml file:
Laura Denies's avatar
Laura Denies committed
17
  
Laura Denies's avatar
Laura Denies committed
18 19
* HMMER-3.2.1
* singalp-4.1
Laura Denies's avatar
Laura Denies committed
20 21

The following tools can either be installed manually or the set-up.sh script can be run to install automatically:
Laura Denies's avatar
Laura Denies committed
22

Laura Denies's avatar
Laura Denies committed
23 24 25 26
* deepARG (v1)
* PlasFlow (v1.1)
* VirSorter (v1.0.5)
* DeepVirFinder (v1.0)
Laura Denies's avatar
Laura Denies committed
27 28 29 30 31 32 33 34 35 36 37

It is recommended to install using the set-up.sh script. 
If installed manually make sure that the tools are installed in the folder "scripts" and the pathways matches those within the config.yaml
After the installation of deepARG make sure to manually adjust the configurations.

For this go to the directory where the program was saved (in this case the scripts/deeparg-ss directory within the PathoFact directroy) and open the files options.py.
Replace the path '/home/gustavo1/tmp/deeparg-ss/'; with the current directory (deepARG path).
Finally for LINUX system to allow diamond to be executed go to ./bin within the deeparg-ss directory and run:
chmod +x diamond
For more explanations on the deepArg configuration see the deepArg documentation: https://bitbucket.org/gusphdproj/deeparg-ss/src/master/

Laura Denies's avatar
Laura Denies committed
38 39
# Usage

Laura Denies's avatar
Laura Denies committed
40 41 42 43 44
##Input configuration
The input to the ViruTox pipeline consists of; (i) an amino acid fasta file of translated gene sequences for the prediction of toxins, virulence factors and antimicrobial resistance genes, (ii) a fasta file containing nucleotide sequences of the corresponding contigs for the prediction of MGEs, and (iii) a tab delimited table consisting of a first column of contig names with the corresponding gene names in the second column to combine predictions. 
Contig and gene names need to correlate with the original names given in the fasta headers. Furhtermore, make sure that no white spaces are present in the fasta headers.
All three input files used by the pipeline for one sample need to be given the same sample name, followed by the suffix .faa (amino acid, gene fasta file), .fna (nucleotide contig fasta file), .contig (table with contig and gene names).

Laura Denies's avatar
Laura Denies committed
45
## Run PathoFact
Laura Denies's avatar
Laura Denies committed
46 47
To run PathoFact the sample name is given in the config.yaml file at "input_file". If wanted more than one sample can be run at the simultanously, for example:

Laura Denies's avatar
Laura Denies committed
48
    * input_file: ["SAMPLE_A","SAMPLE_B"]
Laura Denies's avatar
Laura Denies committed
49 50 51

In "OUTDIR" the pathway to the samples are given and the PathoFact results are deposited in the same directory.

Laura Denies's avatar
Laura Denies committed
52
    * input_file: /path/to/samples
Laura Denies's avatar
Laura Denies committed
53 54 55 56 57

In "project" an unique name for your project need to be given, for example:

    * project:  Project_A_PathoFact

Laura Denies's avatar
Laura Denies committed
58
Pathofact as default will run the complete pipeline for the prediction of virulence factors, toxins and antimicrobial resistance genes.
Laura Denies's avatar
Laura Denies committed
59
 If it is desired to run only part of the pipeline this can be identicated **within** the "Snakefile" by changing "w" to a different option:
Laura Denies's avatar
Laura Denies committed
60

Laura Denies's avatar
Laura Denies committed
61 62 63 64
    * w = 'complete'     (run complete pipeline, default setting)
    * w = 'Tox'          (run only workflow for Toxin prediction)
    * w = 'Vir'          (run only workflow for Virulence prediction)
    * w = 'AMR'          (run only workflow for Antimicrobial resistance and mobile genetic element prediction)
Laura Denies's avatar
Laura Denies committed
65 66 67 68 69

To run the snakemake pipeline and example script is given (run_PathoFact.sh), but the following is the basics to run the pipeline:

    * snakemake -s Snakefile --use-conda -p
    
Laura Denies's avatar
Laura Denies committed
70
    alternatively one can adjust the number of threads per job (when analysing bigger files it is advised to run on either multiple "cores" or cores with "higher" memory)
Laura Denies's avatar
Laura Denies committed
71 72 73 74 75

    * snakemake -s Snakefile -j [number of threads/jobs] --use-conda -p



Laura Denies's avatar
Laura Denies committed
76