README.md 3.35 KB
Newer Older
Laura Denies's avatar
Laura Denies committed
1
# PathoFact 1.0
Laura Denies's avatar
Laura Denies committed
2 3 4 5 6 7 8 9 10 11 12 13

PathoFact is an easy-to-use modular pipeline for the metagenomic analyses of toxins, virulence factors and antimicrobial resistance. 
Additionally, PathoFact combines the prediction of these pathogenic factors with the identification of mobile genetic elements. 
This provides further depth to the analysis by considering the localization of the genes on mobile genetic elements (MGEs), as well as on the chromosome. 
Furthermore, each module (toxins, virulence factors, and antimicrobial resistance) of PathoFact is also a standalone component, making it a flexible and versatile tool. 

# Requirements

PathoFact requires a good working Python (3.6.4), snakemake (version 5.5.4) and (mini)conda installation.
If snakemake is not yet installed one could install this by using the provided conda file (snakemake.yaml)
To install run: conda env create -f snakemake.yaml

Laura Denies's avatar
Laura Denies committed
14 15 16
PathoFact provides the conda environments with the dependencies needed to run the incorporated tools.
Some of the tools itself, however, still need to be installed. 
The following tools need to be installed by the user itself and the path to the tools adjusted within the config.yaml file:
Laura Denies's avatar
Laura Denies committed
17
  
Laura Denies's avatar
Laura Denies committed
18 19 20 21
    *HMMER-3.2.1
    *singalp-4.1

The following tools can either be installed manually or the set-up.sh script can be run to install automatically:
Laura Denies's avatar
Laura Denies committed
22

Laura Denies's avatar
Laura Denies committed
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
    *deepARG (v1)
    *PlasFlow (v1.1)
    *VirSorter (v1.0.5)
    *DeepVirFinder (v1.0)

It is recommended to install using the set-up.sh script. 
If installed manually make sure that the tools are installed in the folder "scripts" and the pathways matches those within the config.yaml
After the installation of deepARG make sure to manually adjust the configurations.

For this go to the directory where the program was saved (in this case the scripts/deeparg-ss directory within the PathoFact directroy) and open the files options.py.
Replace the path '/home/gustavo1/tmp/deeparg-ss/'; with the current directory (deepARG path).
Finally for LINUX system to allow diamond to be executed go to ./bin within the deeparg-ss directory and run:
chmod +x diamond
For more explanations on the deepArg configuration see the deepArg documentation: https://bitbucket.org/gusphdproj/deeparg-ss/src/master/

Laura Denies's avatar
Laura Denies committed
38 39
# Usage

Laura Denies's avatar
Laura Denies committed
40 41 42 43 44 45 46 47 48 49 50
##Input configuration
The input to the ViruTox pipeline consists of; (i) an amino acid fasta file of translated gene sequences for the prediction of toxins, virulence factors and antimicrobial resistance genes, (ii) a fasta file containing nucleotide sequences of the corresponding contigs for the prediction of MGEs, and (iii) a tab delimited table consisting of a first column of contig names with the corresponding gene names in the second column to combine predictions. 
Contig and gene names need to correlate with the original names given in the fasta headers. Furhtermore, make sure that no white spaces are present in the fasta headers.
All three input files used by the pipeline for one sample need to be given the same sample name, followed by the suffix .faa (amino acid, gene fasta file), .fna (nucleotide contig fasta file), .contig (table with contig and gene names).

To run PathoFact the sample name is given in the config.yaml file at "input_file". If wanted more than one sample can be run at the simultanously, for example:

* input_file: ["SAMPLE_A","SAMPLE_B"]

In "OUTDIR" the pathway to the samples are given and the PathoFact results are deposited in the same directory.

Laura Denies's avatar
Laura Denies committed
51