README.md 7.35 KB
Newer Older
AntonieV's avatar
AntonieV committed
1
2
# Snakemake workflow: chipseq

3
[![Snakemake](https://img.shields.io/badge/snakemake-≥6.4.0-brightgreen.svg)](https://snakemake.github.io)
AntonieV's avatar
AntonieV committed
4

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
This [workflow](https://git-r3lab.uni.lu/aurelien.ginolhac/snakemake-chip-seq) is derived from [Snakemake template](https://github.com/snakemake-workflows/chipseq) (itself port of the [nextflow chipseq pipeline](https://nf-co.re/chipseq)) and performs ChIP-seq peak-calling, QC and differential analysis.


## Overview

As displayed in the final [**report**](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/report.html) the pipeline looks like:

![](https://i.imgur.com/mNn5aYx.png)

## Install

On the [UL-HPC](https://hpc.uni.lu/), what is needed is:


- `snakemake`, version at least `6.4`
- `singularity`, provided the HPC as an **EasyBuild module**.
- this workflow template

For snakemake, you could install it via conda
and installing snakemake as in the RFTM

### Install conda

- Follow the instructions from **Sarah Peter** in her [tutorial](https://r3.pages.uni.lu/school/snakemake-tutorial/#conda for installing `Miniconda3` (downloading 90 MB).

```bash
user@access $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
user@access $ bash ./Miniconda3-latest-Linux-x86_64.sh
```

Follow the instructions, press `ENTER`.

You must enter the full path for the installation, like `/home/users/username/tools/miniconda3` where `username` is your login.

And answer `yes` to initialize `conda`, note that your `.bashrc` will be modified, and for this session activated with

```bash
user@access source ~/.bashrc
```

the prompt then has the `(base)` prefix, indicating that the `base` environment is activated by default.
Of note, it will be at each login, you can disable this is you dislike by removing the relevant part in your `.bashrc` 
between the line `# >>> conda initialize >>>` and `# <<< conda initialize <<<`

### Install snakemake with mamba

`mamba` is the [recommended](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) tool for managing environments.

Those 2 commands installed `mamba` in the default `base` environment then `snakemake` in a dedicated environment named `snakemake`

```bash
(base) user@access $ conda install -n base -c conda-forge mamba
(base) user@access $ mamba create -c conda-forge -c bioconda -n snakemake snakemake
```

Once successfull, you need to activate the newly created environment, obsered the new name before you prompt:

```bash
(base) user@access $ conda activate snakemake
(snakemake) user@access $ snakemake --version
```

### Install the ChIP-seq template

In the destination folder of your choice, run the following commands:

```bash
VERSION="v0.0.9"
wget -qO- https://git-r3lab.uni.lu/aurelien.ginolhac/snakemake-chip-seq/-/archive/${VERSION}/snakemake-chip-seq-.tar.gz | tar xfz - --strip-components=1
```

this command will download, extract (without the root folder) the following files:

```
CHANGELOG.md
config/
Dockerfile
LICENSE
README.md
resources/
workflow/
```

you may want to delete the `LICENSE`, `Dockerfile`, `CHANGELOG.md` and `README.md` if you wish, 
they are not used by `snakemake` for runtime.


#### (Optional) Useful aliases

The following lines can be added to your `.bashrc`. 

The 3 first ones are handy shortcuts:

- `dag` is often run to see which steps are to be re-run or not
- `smk` to load necessary on **ULHPC** in interactive sessions
- the `complete` command loads the auto-completion for snakemake

The `PYTHONNOUSERSITE` variable sets to `True` ensures that your installed python packages superseed the default system installed ones. 

```bash
alias dag='snakemake --dag | dot -Tpdf > dag.pdf'
alias smk='conda activate snakemake && module load tools/Singularity'
complete -o bashdefault -C snakemake-bash-completion snakemake
# From Sarah Peter 
export PYTHONNOUSERSITE=True
```

##### (Optional) Update `snakemake`

Once `(base)` *conda* activated, you can update your `snakemake` version with:

`mamba upgrade -n snakemake snakemake`


AntonieV's avatar
AntonieV committed
119
120
121

## Usage

122
The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=snakemake-workflows/chipseq).
AntonieV's avatar
AntonieV committed
123

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
124
125
126

After booking ressources on the access node, like for example here, 1 hour and 6 cores:

127
```
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
128
si -c 6 -t 1:00:00
129
```
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
130

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
### Singularity

this is the easy part as the [HPC team](https://hpc.uni.lu/about/team.html) is providing us with available [modules](https://hpc.uni.lu/users/software/)

the command once on a `node` is:

```bash
(base) user@access module load tools/Singularity
```

### Conda activate


```bash
(base) user@access $ conda activate snakemake
```

Of note, the 2 above steps can be replaced by the alias `smk` if you added the alias in your `.bashrc`

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
It should look like this from accessing the access machine to getting the resources and activating the environment:

```bash
(base) aginolhac@access1.iris-cluster.uni.lux(14:05:02)-> 20:56): ~ $ si -c 6 -t 1:00:00
# salloc -p interactive --qos debug -C batch 
salloc: Pending job allocation 2424900
salloc: job 2424900 queued and waiting for resources
salloc: job 2424900 has been allocated resources
salloc: Granted job allocation 2424900
salloc: Waiting for resource configuration
salloc: Nodes iris-139 are ready for job
(base) aginolhac@iris-139(14:17:21)-> 29:51)(2424900 1N/T/1CN): ~ $ smk
(snakemake) aginolhac@iris-139(14:17:23)-> 29:49)(2424900 1N/T/1CN): ~ $
```

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
### Dry-run


```bash
(snakemake) user@access $ snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac"  -j 6 -n
```


Of note the following messages:
```
Workflow defines that rule get_genome is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule get_annotation is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule genome_faidx is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule bwa_index is eligible for caching between workflows (use the --cache argument to enable this).
```

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
181
are warning you that the **cache** option is not activated. This is useful is your use 
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
182
183
184
185
186
187
188
189


### Activate the cache 

To save download and computation times in between workflows

1. Set-up

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
190
```bash
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
191
192
193
194
195
196
mkdir -p ${SCRATCH}/tmp
export SNAKEMAKE_OUTPUT_CACHE=${SCRATCH}/tmp
```

2. Run command

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
197
```bash
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
198
199
200
201
snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac"  --cache -j 6
```


Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
202

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
3. If the previous step was successful, one can obtain the report by running

```bash
snakemake --report
```


## Expected outputs

With real yeast data:

- the `snakemake` report: `report.html`, download a [copy here](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/report.html)
- the `multiqc` report: `results/qc/multiqc/multiqc_report.html`, download a [copy here](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/multiqc.html)



Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
219
220
221
222
223
# Credits

This template is derived from the official [Snakemake-workflow](https://github.com/snakemake-workflows/chipseq) by [Antonie Vietor](https://github.com/AntonieV) and [David Laehnemann](https://github.com/dlaehnemann).
Initially a port of the [Next-Flow ChIP-seq](https://nf-co.re/chipseq) (https://doi.org/10.5281/zenodo.3240506) by [Harshil Patel](https://github.com/drpatelh) [et al.](https://github.com/nf-core/chipseq/graphs/contributors)