README.md 8 KB
Newer Older
AntonieV's avatar
AntonieV committed
1 2
# Snakemake workflow: chipseq

3
[![Snakemake](https://img.shields.io/badge/snakemake-≥6.4.0-brightgreen.svg)](https://snakemake.github.io)
AntonieV's avatar
AntonieV committed
4

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
This [workflow](https://git-r3lab.uni.lu/aurelien.ginolhac/snakemake-chip-seq) is derived from [Snakemake template](https://github.com/snakemake-workflows/chipseq) (itself port of the [nextflow chipseq pipeline](https://nf-co.re/chipseq)) and performs ChIP-seq peak-calling, QC and differential analysis.


## Overview

As displayed in the final [**report**](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/report.html) the pipeline looks like:

![](https://i.imgur.com/mNn5aYx.png)

## Install

On the [UL-HPC](https://hpc.uni.lu/), what is needed is:


- `snakemake`, version at least `6.4`
- `singularity`, provided the HPC as an **EasyBuild module**.
- this workflow template

For snakemake, you could install it via conda
and installing snakemake as in the RFTM

### Install conda

- Follow the instructions from **Sarah Peter** in her [tutorial](https://r3.pages.uni.lu/school/snakemake-tutorial/#conda for installing `Miniconda3` (downloading 90 MB).

```bash
user@access $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
user@access $ bash ./Miniconda3-latest-Linux-x86_64.sh
```

Follow the instructions, press `ENTER`.

You must enter the full path for the installation, like `/home/users/username/tools/miniconda3` where `username` is your login.

And answer `yes` to initialize `conda`, note that your `.bashrc` will be modified, and for this session activated with

```bash
user@access source ~/.bashrc
```

the prompt then has the `(base)` prefix, indicating that the `base` environment is activated by default.
Of note, it will be at each login, you can disable this is you dislike by removing the relevant part in your `.bashrc` 
between the line `# >>> conda initialize >>>` and `# <<< conda initialize <<<`

### Install snakemake with mamba

`mamba` is the [recommended](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) tool for managing environments.

Those 2 commands installed `mamba` in the default `base` environment then `snakemake` in a dedicated environment named `snakemake`

```bash
(base) user@access $ conda install -n base -c conda-forge mamba
(base) user@access $ mamba create -c conda-forge -c bioconda -n snakemake snakemake
```

Once successfull, you need to activate the newly created environment, obsered the new name before you prompt:

```bash
(base) user@access $ conda activate snakemake
(snakemake) user@access $ snakemake --version
```

### Install the ChIP-seq template

69 70 71 72 73 74 75 76
In the destination folder of your choice, otherwise create the folder such as:

```bash
mkdir snakemake-chip-seq
cd snakemake-chip-seq
``` 

and run the following commands:
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
77 78 79 80 81 82 83 84 85 86

```bash
VERSION="v0.0.9"
wget -qO- https://git-r3lab.uni.lu/aurelien.ginolhac/snakemake-chip-seq/-/archive/${VERSION}/snakemake-chip-seq-.tar.gz | tar xfz - --strip-components=1
```

this command will download, extract (without the root folder) the following files:

```
config/
87
CHANGELOG.md
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
88 89 90 91 92 93 94 95 96 97
Dockerfile
LICENSE
README.md
resources/
workflow/
```

you may want to delete the `LICENSE`, `Dockerfile`, `CHANGELOG.md` and `README.md` if you wish, 
they are not used by `snakemake` for runtime.

98 99 100 101 102 103 104 105
### Fetch test datasets

Using the [nextflow datasets](https://github.com/nf-core/test-datasets), clone it using `git`:

```bash
git clone -b chipseq --depth 1 https://github.com/nf-core/test-datasets.git
```

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

#### (Optional) Useful aliases

The following lines can be added to your `.bashrc`. 

The 3 first ones are handy shortcuts:

- `dag` is often run to see which steps are to be re-run or not
- `smk` to load necessary on **ULHPC** in interactive sessions
- the `complete` command loads the auto-completion for snakemake

The `PYTHONNOUSERSITE` variable sets to `True` ensures that your installed python packages superseed the default system installed ones. 

```bash
alias dag='snakemake --dag | dot -Tpdf > dag.pdf'
alias smk='conda activate snakemake && module load tools/Singularity'
complete -o bashdefault -C snakemake-bash-completion snakemake
# From Sarah Peter 
export PYTHONNOUSERSITE=True
```

##### (Optional) Update `snakemake`

Once `(base)` *conda* activated, you can update your `snakemake` version with:

`mamba upgrade -n snakemake snakemake`

AntonieV's avatar
AntonieV committed
133 134
## Usage

135
The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=snakemake-workflows/chipseq).
AntonieV's avatar
AntonieV committed
136

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
137 138 139

After booking ressources on the access node, like for example here, 1 hour and 6 cores:

140
```
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
141
si -c 6 -t 1:00:00
142
```
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
143

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
### Singularity

this is the easy part as the [HPC team](https://hpc.uni.lu/about/team.html) is providing us with available [modules](https://hpc.uni.lu/users/software/)

the command once on a `node` is:

```bash
(base) user@access module load tools/Singularity
```

### Conda activate


```bash
(base) user@access $ conda activate snakemake
```

Of note, the 2 above steps can be replaced by the alias `smk` if you added the alias in your `.bashrc`

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177
It should look like this from accessing the access machine to getting the resources and activating the environment:

```bash
(base) aginolhac@access1.iris-cluster.uni.lux(14:05:02)-> 20:56): ~ $ si -c 6 -t 1:00:00
# salloc -p interactive --qos debug -C batch 
salloc: Pending job allocation 2424900
salloc: job 2424900 queued and waiting for resources
salloc: job 2424900 has been allocated resources
salloc: Granted job allocation 2424900
salloc: Waiting for resource configuration
salloc: Nodes iris-139 are ready for job
(base) aginolhac@iris-139(14:17:21)-> 29:51)(2424900 1N/T/1CN): ~ $ smk
(snakemake) aginolhac@iris-139(14:17:23)-> 29:49)(2424900 1N/T/1CN): ~ $
```

178

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
### Dry-run


```bash
(snakemake) user@access $ snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac"  -j 6 -n
```


Of note the following messages:
```
Workflow defines that rule get_genome is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule get_annotation is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule genome_faidx is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule bwa_index is eligible for caching between workflows (use the --cache argument to enable this).
```

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
195
are warning you that the **cache** option is not activated. This is useful is your use 
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
196 197 198 199 200 201 202 203


### Activate the cache 

To save download and computation times in between workflows

1. Set-up

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
204
```bash
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
205 206 207 208 209 210
mkdir -p ${SCRATCH}/tmp
export SNAKEMAKE_OUTPUT_CACHE=${SCRATCH}/tmp
```

2. Run command

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
211
```bash
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
212 213 214 215
snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac"  --cache -j 6
```


Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
216

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
217 218 219 220 221 222 223 224 225
3. If the previous step was successful, one can obtain the report by running

```bash
snakemake --report
```


## Expected outputs

226
For the test data:
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
227 228 229 230

- the `snakemake` report: `report.html`, download a [copy here](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/report.html)
- the `multiqc` report: `results/qc/multiqc/multiqc_report.html`, download a [copy here](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/multiqc.html)

231 232 233 234
With real yeast data:

- the `snakemake` report: `report.html`, download a [copy here](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/report_yeast.html)
- the `multiqc` report: `results/qc/multiqc/multiqc_report.html`, download a [copy here](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/multiqc_yeast.html)
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
235 236


Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
237 238 239 240 241
# Credits

This template is derived from the official [Snakemake-workflow](https://github.com/snakemake-workflows/chipseq) by [Antonie Vietor](https://github.com/AntonieV) and [David Laehnemann](https://github.com/dlaehnemann).
Initially a port of the [Next-Flow ChIP-seq](https://nf-co.re/chipseq) (https://doi.org/10.5281/zenodo.3240506) by [Harshil Patel](https://github.com/drpatelh) [et al.](https://github.com/nf-core/chipseq/graphs/contributors)