README.md 7.56 KB
Newer Older
AntonieV's avatar
AntonieV committed
1 2
# Snakemake workflow: chipseq

3
[![Snakemake](https://img.shields.io/badge/snakemake-≥6.4.0-brightgreen.svg)](https://snakemake.github.io)
AntonieV's avatar
AntonieV committed
4

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
This [workflow](https://git-r3lab.uni.lu/aurelien.ginolhac/snakemake-chip-seq) is derived from [Snakemake template](https://github.com/snakemake-workflows/chipseq) (itself port of the [nextflow chipseq pipeline](https://nf-co.re/chipseq)) and performs ChIP-seq peak-calling, QC and differential analysis.


## Overview

As displayed in the final [**report**](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/report.html) the pipeline looks like:

![](https://i.imgur.com/mNn5aYx.png)

## Install

On the [UL-HPC](https://hpc.uni.lu/), what is needed is:


- `snakemake`, version at least `6.4`
- `singularity`, provided the HPC as an **EasyBuild module**.
- this workflow template

For snakemake, you could install it via conda
and installing snakemake as in the RFTM

### Install conda

- Follow the instructions from **Sarah Peter** in her [tutorial](https://r3.pages.uni.lu/school/snakemake-tutorial/#conda for installing `Miniconda3` (downloading 90 MB).

```bash
user@access $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
user@access $ bash ./Miniconda3-latest-Linux-x86_64.sh
```

Follow the instructions, press `ENTER`.

You must enter the full path for the installation, like `/home/users/username/tools/miniconda3` where `username` is your login.

And answer `yes` to initialize `conda`, note that your `.bashrc` will be modified, and for this session activated with

```bash
user@access source ~/.bashrc
```

the prompt then has the `(base)` prefix, indicating that the `base` environment is activated by default.
Of note, it will be at each login, you can disable this is you dislike by removing the relevant part in your `.bashrc` 
between the line `# >>> conda initialize >>>` and `# <<< conda initialize <<<`

### Install snakemake with mamba

`mamba` is the [recommended](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) tool for managing environments.

Those 2 commands installed `mamba` in the default `base` environment then `snakemake` in a dedicated environment named `snakemake`

```bash
(base) user@access $ conda install -n base -c conda-forge mamba
(base) user@access $ mamba create -c conda-forge -c bioconda -n snakemake snakemake
```

Once successfull, you need to activate the newly created environment, obsered the new name before you prompt:

```bash
(base) user@access $ conda activate snakemake
(snakemake) user@access $ snakemake --version
```

### Install the ChIP-seq template

In the destination folder of your choice, run the following commands:

```bash
VERSION="v0.0.9"
wget -qO- https://git-r3lab.uni.lu/aurelien.ginolhac/snakemake-chip-seq/-/archive/${VERSION}/snakemake-chip-seq-.tar.gz | tar xfz - --strip-components=1
```

this command will download, extract (without the root folder) the following files:

```
CHANGELOG.md
config/
Dockerfile
LICENSE
README.md
resources/
workflow/
```

you may want to delete the `LICENSE`, `Dockerfile`, `CHANGELOG.md` and `README.md` if you wish, 
they are not used by `snakemake` for runtime.


#### (Optional) Useful aliases

The following lines can be added to your `.bashrc`. 

The 3 first ones are handy shortcuts:

- `dag` is often run to see which steps are to be re-run or not
- `smk` to load necessary on **ULHPC** in interactive sessions
- the `complete` command loads the auto-completion for snakemake

The `PYTHONNOUSERSITE` variable sets to `True` ensures that your installed python packages superseed the default system installed ones. 

```bash
alias dag='snakemake --dag | dot -Tpdf > dag.pdf'
alias smk='conda activate snakemake && module load tools/Singularity'
complete -o bashdefault -C snakemake-bash-completion snakemake
# From Sarah Peter 
export PYTHONNOUSERSITE=True
```

##### (Optional) Update `snakemake`

Once `(base)` *conda* activated, you can update your `snakemake` version with:

`mamba upgrade -n snakemake snakemake`


AntonieV's avatar
AntonieV committed
119 120 121

## Usage

122
The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=snakemake-workflows/chipseq).
AntonieV's avatar
AntonieV committed
123

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
124 125 126

After booking ressources on the access node, like for example here, 1 hour and 6 cores:

127
```
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
128
si -c 6 -t 1:00:00
129
```
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
130

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
### Singularity

this is the easy part as the [HPC team](https://hpc.uni.lu/about/team.html) is providing us with available [modules](https://hpc.uni.lu/users/software/)

the command once on a `node` is:

```bash
(base) user@access module load tools/Singularity
```

### Conda activate


```bash
(base) user@access $ conda activate snakemake
```

Of note, the 2 above steps can be replaced by the alias `smk` if you added the alias in your `.bashrc`

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164
It should look like this from accessing the access machine to getting the resources and activating the environment:

```bash
(base) aginolhac@access1.iris-cluster.uni.lux(14:05:02)-> 20:56): ~ $ si -c 6 -t 1:00:00
# salloc -p interactive --qos debug -C batch 
salloc: Pending job allocation 2424900
salloc: job 2424900 queued and waiting for resources
salloc: job 2424900 has been allocated resources
salloc: Granted job allocation 2424900
salloc: Waiting for resource configuration
salloc: Nodes iris-139 are ready for job
(base) aginolhac@iris-139(14:17:21)-> 29:51)(2424900 1N/T/1CN): ~ $ smk
(snakemake) aginolhac@iris-139(14:17:23)-> 29:49)(2424900 1N/T/1CN): ~ $
```

165 166 167 168 169 170 171 172 173
### Fetch test datasets

Using the [nextflow datasets](https://github.com/nf-core/test-datasets), clone it using `git`:

```bash
git clone -b --depth 1 chipseq https://github.com/nf-core/test-datasets.git
```


Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189
### Dry-run


```bash
(snakemake) user@access $ snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac"  -j 6 -n
```


Of note the following messages:
```
Workflow defines that rule get_genome is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule get_annotation is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule genome_faidx is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule bwa_index is eligible for caching between workflows (use the --cache argument to enable this).
```

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
190
are warning you that the **cache** option is not activated. This is useful is your use 
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
191 192 193 194 195 196 197 198


### Activate the cache 

To save download and computation times in between workflows

1. Set-up

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
199
```bash
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
200 201 202 203 204 205
mkdir -p ${SCRATCH}/tmp
export SNAKEMAKE_OUTPUT_CACHE=${SCRATCH}/tmp
```

2. Run command

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
206
```bash
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
207 208 209 210
snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac"  --cache -j 6
```


Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
211

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227
3. If the previous step was successful, one can obtain the report by running

```bash
snakemake --report
```


## Expected outputs

With real yeast data:

- the `snakemake` report: `report.html`, download a [copy here](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/report.html)
- the `multiqc` report: `results/qc/multiqc/multiqc_report.html`, download a [copy here](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/multiqc.html)



Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
228 229 230 231 232
# Credits

This template is derived from the official [Snakemake-workflow](https://github.com/snakemake-workflows/chipseq) by [Antonie Vietor](https://github.com/AntonieV) and [David Laehnemann](https://github.com/dlaehnemann).
Initially a port of the [Next-Flow ChIP-seq](https://nf-co.re/chipseq) (https://doi.org/10.5281/zenodo.3240506) by [Harshil Patel](https://github.com/drpatelh) [et al.](https://github.com/nf-core/chipseq/graphs/contributors)