README.md 8.38 KB
Newer Older
AntonieV's avatar
AntonieV committed
1
2
# Snakemake workflow: chipseq

3
[![Snakemake](https://img.shields.io/badge/snakemake-≥6.4.0-brightgreen.svg)](https://snakemake.github.io)
AntonieV's avatar
AntonieV committed
4

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
5
This [workflow](https://gitlab.lcsb.uni.lu/aurelien.ginolhac/snakemake-chip-seq) is derived from [Snakemake template](https://github.com/snakemake-workflows/chipseq) (itself port of the [nextflow chipseq pipeline](https://nf-co.re/chipseq)) and performs ChIP-seq peak-calling, QC and differential analysis.
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68


## Overview

As displayed in the final [**report**](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/report.html) the pipeline looks like:

![](https://i.imgur.com/mNn5aYx.png)

## Install

On the [UL-HPC](https://hpc.uni.lu/), what is needed is:


- `snakemake`, version at least `6.4`
- `singularity`, provided the HPC as an **EasyBuild module**.
- this workflow template

For snakemake, you could install it via conda
and installing snakemake as in the RFTM

### Install conda

- Follow the instructions from **Sarah Peter** in her [tutorial](https://r3.pages.uni.lu/school/snakemake-tutorial/#conda for installing `Miniconda3` (downloading 90 MB).

```bash
user@access $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
user@access $ bash ./Miniconda3-latest-Linux-x86_64.sh
```

Follow the instructions, press `ENTER`.

You must enter the full path for the installation, like `/home/users/username/tools/miniconda3` where `username` is your login.

And answer `yes` to initialize `conda`, note that your `.bashrc` will be modified, and for this session activated with

```bash
user@access source ~/.bashrc
```

the prompt then has the `(base)` prefix, indicating that the `base` environment is activated by default.
Of note, it will be at each login, you can disable this is you dislike by removing the relevant part in your `.bashrc` 
between the line `# >>> conda initialize >>>` and `# <<< conda initialize <<<`

### Install snakemake with mamba

`mamba` is the [recommended](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) tool for managing environments.

Those 2 commands installed `mamba` in the default `base` environment then `snakemake` in a dedicated environment named `snakemake`

```bash
(base) user@access $ conda install -n base -c conda-forge mamba
(base) user@access $ mamba create -c conda-forge -c bioconda -n snakemake snakemake
```

Once successfull, you need to activate the newly created environment, obsered the new name before you prompt:

```bash
(base) user@access $ conda activate snakemake
(snakemake) user@access $ snakemake --version
```

### Install the ChIP-seq template

69
70
71
72
73
74
75
76
In the destination folder of your choice, otherwise create the folder such as:

```bash
mkdir snakemake-chip-seq
cd snakemake-chip-seq
``` 

and run the following commands:
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
77
78

```bash
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
79
VERSION="v0.1.1"
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
80
wget -qO- https://gitlab.lcsb.uni.lu/aurelien.ginolhac/snakemake-chip-seq/-/archive/${VERSION}/snakemake-chip-seq-${VERSION}.tar.gz | tar xfz - --strip-components=1
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
81
82
83
84
85
```

this command will download, extract (without the root folder) the following files:

```
86
CHANGELOG.md
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
87
config/
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
88
89
90
91
92
93
94
95
96
97
Dockerfile
LICENSE
README.md
resources/
workflow/
```

you may want to delete the `LICENSE`, `Dockerfile`, `CHANGELOG.md` and `README.md` if you wish, 
they are not used by `snakemake` for runtime.

98
99
100
101
102
103
104
105
### Fetch test datasets

Using the [nextflow datasets](https://github.com/nf-core/test-datasets), clone it using `git`:

```bash
git clone -b chipseq --depth 1 https://github.com/nf-core/test-datasets.git
```

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132

#### (Optional) Useful aliases

The following lines can be added to your `.bashrc`. 

The 3 first ones are handy shortcuts:

- `dag` is often run to see which steps are to be re-run or not
- `smk` to load necessary on **ULHPC** in interactive sessions
- the `complete` command loads the auto-completion for snakemake

The `PYTHONNOUSERSITE` variable sets to `True` ensures that your installed python packages superseed the default system installed ones. 

```bash
alias dag='snakemake --dag | dot -Tpdf > dag.pdf'
alias smk='conda activate snakemake && module load tools/Singularity'
complete -o bashdefault -C snakemake-bash-completion snakemake
# From Sarah Peter 
export PYTHONNOUSERSITE=True
```

##### (Optional) Update `snakemake`

Once `(base)` *conda* activated, you can update your `snakemake` version with:

`mamba upgrade -n snakemake snakemake`

AntonieV's avatar
AntonieV committed
133
134
## Usage

135
The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=snakemake-workflows/chipseq).
AntonieV's avatar
AntonieV committed
136

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
137
138
139

After booking ressources on the access node, like for example here, 1 hour and 6 cores:

140
```
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
141
si -c 6 -t 1:00:00
142
```
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
143

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
### Singularity

this is the easy part as the [HPC team](https://hpc.uni.lu/about/team.html) is providing us with available [modules](https://hpc.uni.lu/users/software/)

the command once on a `node` is:

```bash
(base) user@access module load tools/Singularity
```

### Conda activate


```bash
(base) user@access $ conda activate snakemake
```

Of note, the 2 above steps can be replaced by the alias `smk` if you added the alias in your `.bashrc`

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
It should look like this from accessing the access machine to getting the resources and activating the environment:

```bash
(base) aginolhac@access1.iris-cluster.uni.lux(14:05:02)-> 20:56): ~ $ si -c 6 -t 1:00:00
# salloc -p interactive --qos debug -C batch 
salloc: Pending job allocation 2424900
salloc: job 2424900 queued and waiting for resources
salloc: job 2424900 has been allocated resources
salloc: Granted job allocation 2424900
salloc: Waiting for resource configuration
salloc: Nodes iris-139 are ready for job
(base) aginolhac@iris-139(14:17:21)-> 29:51)(2424900 1N/T/1CN): ~ $ smk
(snakemake) aginolhac@iris-139(14:17:23)-> 29:49)(2424900 1N/T/1CN): ~ $
```

178

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
### Dry-run


```bash
(snakemake) user@access $ snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac"  -j 6 -n
```


Of note the following messages:
```
Workflow defines that rule get_genome is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule get_annotation is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule genome_faidx is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule bwa_index is eligible for caching between workflows (use the --cache argument to enable this).
```

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
195
are warning you that the **cache** option is not activated. This is useful is your use 
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
196
197
198
199
200
201
202
203


### Activate the cache 

To save download and computation times in between workflows

1. Set-up

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
204
```bash
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
205
206
207
208
209
210
mkdir -p ${SCRATCH}/tmp
export SNAKEMAKE_OUTPUT_CACHE=${SCRATCH}/tmp
```

2. Run command

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
211
```bash
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
212
213
214
snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac"  --cache -j 6
```

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
215
216
217
218
219
220
221
The following warnings can be ignored:

```
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
logs/fastqc/Spt5.1.1.log' which didn't exist, or couldn't be read
```
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
222

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
223
224
225
The full workflow takes less than 15 minutes, most steps are done within 5 minutes.
The rule `plot_fingerprint` is the longest step.

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
226

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
227
228
229
230
231
232
233
234
235
3. If the previous step was successful, one can obtain the report by running

```bash
snakemake --report
```


## Expected outputs

236
For the test data:
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
237
238
239
240

- the `snakemake` report: `report.html`, download a [copy here](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/report.html)
- the `multiqc` report: `results/qc/multiqc/multiqc_report.html`, download a [copy here](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/multiqc.html)

241
242
243
244
With real yeast data:

- the `snakemake` report: `report.html`, download a [copy here](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/report_yeast.html)
- the `multiqc` report: `results/qc/multiqc/multiqc_report.html`, download a [copy here](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/multiqc_yeast.html)
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
245
246


Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
247
248
249
250
251
# Credits

This template is derived from the official [Snakemake-workflow](https://github.com/snakemake-workflows/chipseq) by [Antonie Vietor](https://github.com/AntonieV) and [David Laehnemann](https://github.com/dlaehnemann).
Initially a port of the [Next-Flow ChIP-seq](https://nf-co.re/chipseq) (https://doi.org/10.5281/zenodo.3240506) by [Harshil Patel](https://github.com/drpatelh) [et al.](https://github.com/nf-core/chipseq/graphs/contributors)