PathoFact 1.0
PathoFact is an easy-to-use modular pipeline for the metagenomic analyses of toxins, virulence factors and antimicrobial resistance. Additionally, PathoFact combines the prediction of these pathogenic factors with the identification of mobile genetic elements. This provides further depth to the analysis by considering the localization of the genes on mobile genetic elements (MGEs), as well as on the chromosome. Furthermore, each module (toxins, virulence factors, and antimicrobial resistance) of PathoFact is also a standalone component, making it a flexible and versatile tool.
Requirements
PathoFact requires a good working Python (3.6.4), snakemake (version 5.5.4) and (mini)conda installation. If snakemake is not yet installed one could install this by using the provided conda file (snakemake.yaml) To install run: conda env create -f snakemake.yaml
PathoFact provides the conda environments with the dependencies needed to run the incorporated tools. Some of the tools itself, however, still need to be installed. The following tools need to be installed by the user itself and the path to the tools adjusted within the config.yaml file:
- HMMER-3.2.1
- singalp-4.1
The following tools can either be installed manually or the set-up.sh script can be run to install automatically:
- deepARG (v1)
- PlasFlow (v1.1)
- VirSorter (v1.0.5)
- DeepVirFinder (v1.0)
It is recommended to install using the set-up.sh script. If installed manually make sure that the tools are installed in the folder "scripts" and the pathways matches those within the config.yaml After the installation of deepARG make sure to manually adjust the configurations.
For this go to the directory where the program was saved (in this case the scripts/deeparg-ss directory within the PathoFact directroy) and open the files options.py. Replace the path '/home/gustavo1/tmp/deeparg-ss/'; with the current directory (deepARG path). Finally for LINUX system to allow diamond to be executed go to ./bin within the deeparg-ss directory and run: chmod +x diamond For more explanations on the deepArg configuration see the deepArg documentation: https://bitbucket.org/gusphdproj/deeparg-ss/src/master/
Usage
##Input configuration The input to the ViruTox pipeline consists of; (i) an amino acid fasta file of translated gene sequences for the prediction of toxins, virulence factors and antimicrobial resistance genes, (ii) a fasta file containing nucleotide sequences of the corresponding contigs for the prediction of MGEs, and (iii) a tab delimited table consisting of a first column of contig names with the corresponding gene names in the second column to combine predictions. Contig and gene names need to correlate with the original names given in the fasta headers. Furhtermore, make sure that no white spaces are present in the fasta headers. All three input files used by the pipeline for one sample need to be given the same sample name, followed by the suffix .faa (amino acid, gene fasta file), .fna (nucleotide contig fasta file), .contig (table with contig and gene names).
Run PathoFact
To run PathoFact the sample name is given in the config.yaml file at "input_file". If wanted more than one sample can be run at the simultanously, for example:
* input_file: ["SAMPLE_A","SAMPLE_B"]
In "OUTDIR" the pathway to the samples are given and the PathoFact results are deposited in the same directory.
* input_file: /path/to/samples
In "project" an unique name for your project need to be given, for example:
* project: Project_A_PathoFact
Pathofact as default will run the complete pipeline for the prediction of virulence factors, toxins and antimicrobial resistance genes. If it is desired to run only part of the pipeline this can be indicated within the "Snakefile" by changing "w" to a different option:
* w = 'complete' (run complete pipeline, default setting)
* w = 'Tox' (run only workflow for Toxin prediction)
* w = 'Vir' (run only workflow for Virulence prediction)
* w = 'AMR' (run only workflow for Antimicrobial resistance and mobile genetic element prediction)
To run the snakemake pipeline and example script is given (run_PathoFact.sh), but the following is the basics to run the pipeline:
* snakemake -s Snakefile --use-conda -p
alternatively one can adjust the number of threads per job (when analysing bigger files it is advised to run on either multiple "cores" or cores with "higher" memory)
* snakemake -s Snakefile -j [number of threads/jobs] --use-conda -p