To run PathoFact you need to adjust some parameters in
sample: This is a list of sample names, e.g.
project: A unique project name which will be used as the name of the output directory in
datapathpath (see below).
datapath: Path to directory containing the sample data; the output directory will be created there.
workflow: Pathofact can run the complete pipeline (default) or a specific step:
- "complete": toxin + virulence + AMR + MGE prediction
- "Tox": toxin prediction
- "Vir": virulence prediction
- "AMR": antimicrobial resistance (AMR) & mobile genetic elements (MGEs) prediction
Basic command to run the pipeline using
# activate the env conda activate PathoFact # run the pipeline # set <cores> to the number of cores to use, e.g. 10 snakemake -s Snakefile --use-conda --reason --cores <cores> -p
NOTE: Add parameter
--dry-run) to the command to see which steps will be executed without running them.
--configfile <configfile.yaml> to use a different config file than
NOTE: It is advised to run the pipeline using multiple CPUs or CPUs with "higher" memory.
For more options, see the snakemake documentation.
Execution on a cluster
The pipeline can be run on a cluster using
The command can be found in the script
cluster.sh which can also be used to submit the jobs to the cluster.