README.md 6.16 KB
Newer Older
AntonieV's avatar
AntonieV committed
1 2
# Snakemake workflow: chipseq

3
[![Snakemake](https://img.shields.io/badge/snakemake-≥6.4.0-brightgreen.svg)](https://snakemake.github.io)
AntonieV's avatar
AntonieV committed
4

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
This [workflow](https://git-r3lab.uni.lu/aurelien.ginolhac/snakemake-chip-seq) is derived from [Snakemake template](https://github.com/snakemake-workflows/chipseq) (itself port of the [nextflow chipseq pipeline](https://nf-co.re/chipseq)) and performs ChIP-seq peak-calling, QC and differential analysis.


## Overview

As displayed in the final [**report**](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/report.html) the pipeline looks like:

![](https://i.imgur.com/mNn5aYx.png)

## Install

On the [UL-HPC](https://hpc.uni.lu/), what is needed is:


- `snakemake`, version at least `6.4`
- `singularity`, provided the HPC as an **EasyBuild module**.
- this workflow template

For snakemake, you could install it via conda
and installing snakemake as in the RFTM

### Install conda

- Follow the instructions from **Sarah Peter** in her [tutorial](https://r3.pages.uni.lu/school/snakemake-tutorial/#conda for installing `Miniconda3` (downloading 90 MB).

```bash
user@access $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
user@access $ bash ./Miniconda3-latest-Linux-x86_64.sh
```

Follow the instructions, press `ENTER`.

You must enter the full path for the installation, like `/home/users/username/tools/miniconda3` where `username` is your login.

And answer `yes` to initialize `conda`, note that your `.bashrc` will be modified, and for this session activated with

```bash
user@access source ~/.bashrc
```

the prompt then has the `(base)` prefix, indicating that the `base` environment is activated by default.
Of note, it will be at each login, you can disable this is you dislike by removing the relevant part in your `.bashrc` 
between the line `# >>> conda initialize >>>` and `# <<< conda initialize <<<`

### Install snakemake with mamba

`mamba` is the [recommended](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) tool for managing environments.

Those 2 commands installed `mamba` in the default `base` environment then `snakemake` in a dedicated environment named `snakemake`

```bash
(base) user@access $ conda install -n base -c conda-forge mamba
(base) user@access $ mamba create -c conda-forge -c bioconda -n snakemake snakemake
```

Once successfull, you need to activate the newly created environment, obsered the new name before you prompt:

```bash
(base) user@access $ conda activate snakemake
(snakemake) user@access $ snakemake --version
```

### Install the ChIP-seq template

In the destination folder of your choice, run the following commands:

```bash
VERSION="v0.0.9"
wget -qO- https://git-r3lab.uni.lu/aurelien.ginolhac/snakemake-chip-seq/-/archive/${VERSION}/snakemake-chip-seq-.tar.gz | tar xfz - --strip-components=1
```

this command will download, extract (without the root folder) the following files:

```
CHANGELOG.md
config/
Dockerfile
LICENSE
README.md
resources/
workflow/
```

you may want to delete the `LICENSE`, `Dockerfile`, `CHANGELOG.md` and `README.md` if you wish, 
they are not used by `snakemake` for runtime.


#### (Optional) Useful aliases

The following lines can be added to your `.bashrc`. 

The 3 first ones are handy shortcuts:

- `dag` is often run to see which steps are to be re-run or not
- `smk` to load necessary on **ULHPC** in interactive sessions
- the `complete` command loads the auto-completion for snakemake

The `PYTHONNOUSERSITE` variable sets to `True` ensures that your installed python packages superseed the default system installed ones. 

```bash
alias dag='snakemake --dag | dot -Tpdf > dag.pdf'
alias smk='conda activate snakemake && module load tools/Singularity'
complete -o bashdefault -C snakemake-bash-completion snakemake
# From Sarah Peter 
export PYTHONNOUSERSITE=True
```

##### (Optional) Update `snakemake`

Once `(base)` *conda* activated, you can update your `snakemake` version with:

`mamba upgrade -n snakemake snakemake`


AntonieV's avatar
AntonieV committed
119 120 121

## Usage

122
The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=snakemake-workflows/chipseq).
AntonieV's avatar
AntonieV committed
123

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
124 125 126

After booking ressources on the access node, like for example here, 1 hour and 6 cores:

127
```
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
128
si -c 6 -t 1:00:00
129
```
Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
130

Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186
### Singularity

this is the easy part as the [HPC team](https://hpc.uni.lu/about/team.html) is providing us with available [modules](https://hpc.uni.lu/users/software/)

the command once on a `node` is:

```bash
(base) user@access module load tools/Singularity
```

### Conda activate


```bash
(base) user@access $ conda activate snakemake
```

Of note, the 2 above steps can be replaced by the alias `smk` if you added the alias in your `.bashrc`

### Dry-run


```bash
(snakemake) user@access $ snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac"  -j 6 -n
```


Of note the following messages:
```
Workflow defines that rule get_genome is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule get_annotation is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule genome_faidx is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule bwa_index is eligible for caching between workflows (use the --cache argument to enable this).
```

are warning you that 


### Activate the cache 

To save download and computation times in between workflows

1. Set-up

```
mkdir -p ${SCRATCH}/tmp
export SNAKEMAKE_OUTPUT_CACHE=${SCRATCH}/tmp
```

2. Run command

```
snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac"  --cache -j 6
```


Aurélien Ginolhac's avatar
Aurélien Ginolhac committed
187 188 189 190 191 192

# Credits

This template is derived from the official [Snakemake-workflow](https://github.com/snakemake-workflows/chipseq) by [Antonie Vietor](https://github.com/AntonieV) and [David Laehnemann](https://github.com/dlaehnemann).
Initially a port of the [Next-Flow ChIP-seq](https://nf-co.re/chipseq) (https://doi.org/10.5281/zenodo.3240506) by [Harshil Patel](https://github.com/drpatelh) [et al.](https://github.com/nf-core/chipseq/graphs/contributors)