Configuration
To run PathoFact you need to adjust some parameters in config.yaml
.
-
sample
: This is a list of sample names, e.g.sample: ["SAMPLE_A","SAMPLE_B"]
-
project
: A unique project name which will be used as the name of the output directory indatapath
path (see below). -
datapath
: Path to directory containing the sample data; the output directory will be created there. -
workflow
: Pathofact can run the complete pipeline (default) or a specific step:- "complete": toxin + virulence + AMR + MGE prediction
- "Tox": toxin prediction
- "Vir": virulence prediction
- "AMR": antimicrobial resistance (AMR) & mobile genetic elements (MGEs) prediction
Execution
Basic command to run the pipeline using <cores>
CPUs:
# activate the env
conda activate PathoFact
# run the pipeline
# set <cores> to the number of cores to use, e.g. 10
snakemake -s Snakefile --use-conda --reason --cores <cores> -p
NOTE: Add parameter -n
(or --dry-run
) to the command to see which steps will be executed without running them.
NOTE: Add --configfile <configfile.yaml>
to use a different config file than config.yaml
.
NOTE: It is advised to run the pipeline using multiple CPUs or CPUs with "higher" memory.
For more options, see the snakemake documentation.
Execution on a cluster
The pipeline can be run on a cluster using slurm
.
The command can be found in the script cluster.sh
which can also be used to submit the jobs to the cluster.
sbatch cluster.sh