...
 
Commits (2)
NAMES = ["a", "b", "c"]
# at the top of the file
import os
import glob
print("These are the names: ")
print(NAMES)
rule all:
input: "list_of_generated_output.txt"
rule generate_some_input:
output: "{your_pattern}_in.txt"
shell:
"""
echo "{wildcards.your_pattern}" > {output}
"""
rule generate_some_output:
input:
first="{pattern}_first_in.txt",
second="{pattern}_second_in.txt",
output: "{pattern}_out.txt"
message: "GENERATE PATTERNED {output} - Pattern: {wildcards.pattern}"
shell:
"""
cat {input.first} {input.second} > {output}
"""
# The `str(output)` is necessary as `open()` expectes a `string` type and not an `OutputFiles` type
rule list_generated_output:
input: expand("{NAME}_out.txt", NAME=NAMES)
output: "list_of_generated_output.txt"
threads: 4
shell:
"""
echo 'These are all the "out" names' > {output}
find ./ -name "*_out.txt" -print0 | xargs -0 -P {threads} -I{{}} cat {{}} >> {output}
"""
......@@ -90,3 +90,14 @@ Use the `glob_wildcards()` function, s. https://hpc-carpentry.github.io/hpc-pyth
snakemake -s 06-python_run.smk
cat list_of_generated_output.txt
```
# Parallelization
## Simple parallelization
```
snakemake -s 07-threads.smk -p
cat list_of_generated_output.txt # All files should be in alphabetical order
snakemake -j 2 -s 07-threads.smk -p # Files can be listed *out*-of-order
```
## Cluster-scale parallelization
Demo using MUST/LeGeLiS sample