Verified Commit 283eb4b9 authored by Aurélien Ginolhac's avatar Aurélien Ginolhac 🚴
Browse files

expand README

parent ed8d61ce
......@@ -11,9 +11,10 @@ results/*
!.gitignore
!.gitattributes
!.editorconfig
!.travis.yml
!.test
!.test/config/*
!.test/ngs-test-data
!LICENSE
!README.md
!Dockerfile
!CHANGELOG.md
## v0.1.0
- added a changelog
## v0.0.9
### Main changes from the snakemake worlflow template
- [Singularity](https://sylabs.io/singularity/) using a docker image published on [Docker hub](https://hub.docker.com/r/ginolhac/snake-chip-seq)
- [AdapterRemoval](https://adapterremoval.readthedocs.io/en/latest/) for trimming (replacement of `cutadapt`)
- add [FastqScreen](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/)
- drop `DESeq2` for contrast in favor to the dedicated tool: [`DiffBind`](https://bioconductor.org/packages/release/bioc/html/DiffBind.html)
- don't import the IGV session in report, the HTML becomes too large
\ No newline at end of file
......@@ -140,10 +140,9 @@ RUN python3 -m pip install snakemake multiqc pandas pyBigWig pysam macs2 deeptoo
# R is 4.0.0
# BiocManager 1.30.16, for BioC 3.12, #FIXME SPP is 1.16.0 if fine, avoid installing 1.15.2 line 148
RUN install2.r --error littler BiocManager && \
/usr/local/lib/R/site-library/littler/examples/installBioc.r DESeq2 Rsubread apeglm pheatmap vsn Rsamtools && \
install2.r --error hexbin ggrepel cowplot tidyverse UpSetR # spp
/usr/local/lib/R/site-library/littler/examples/installBioc.r DESeq2 apeglm pheatmap vsn Rsamtools && \
install2.r --error hexbin ggrepel cowplot tidyverse UpSetR
# DESeq2 is 1.30.1
# Rsubread is 2.4.3
# apeglm 1.12.0
# Install phantompeakqualtools, SPP fixed version 1.15.2
......@@ -163,9 +162,7 @@ RUN cd /opt && \
unzip bowtie2-${BT2_VERSION}-linux-x86_64.zip && rm bowtie2-${BT2_VERSION}-linux-x86_64.zip && \
mv bowtie2-${BT2_VERSION}-linux-x86_64/bowtie2* /usr/local/bin/ && rm -rf bowtie2-${BT2_VERSION}-linux-x86_64
# clean up
# clean up
RUN apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
apt-get autoclean && \
......
......@@ -2,24 +2,191 @@
[![Snakemake](https://img.shields.io/badge/snakemake-≥6.4.0-brightgreen.svg)](https://snakemake.github.io)
This workflow is a Snakemake port of the [nextflow chipseq pipeline](https://nf-co.re/chipseq) and performs ChIP-seq peak-calling, QC and differential analysis.
This [workflow](https://git-r3lab.uni.lu/aurelien.ginolhac/snakemake-chip-seq) is derived from [Snakemake template](https://github.com/snakemake-workflows/chipseq) (itself port of the [nextflow chipseq pipeline](https://nf-co.re/chipseq)) and performs ChIP-seq peak-calling, QC and differential analysis.
## Overview
As displayed in the final [**report**](https://xsOdPxjHMEpp3hc:123456@owncloud.lcsb.uni.lu/public.php/webdav/report.html) the pipeline looks like:
![](https://i.imgur.com/mNn5aYx.png)
## Install
On the [UL-HPC](https://hpc.uni.lu/), what is needed is:
- `snakemake`, version at least `6.4`
- `singularity`, provided the HPC as an **EasyBuild module**.
- this workflow template
For snakemake, you could install it via conda
and installing snakemake as in the RFTM
### Install conda
- Follow the instructions from **Sarah Peter** in her [tutorial](https://r3.pages.uni.lu/school/snakemake-tutorial/#conda for installing `Miniconda3` (downloading 90 MB).
```bash
user@access $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
user@access $ bash ./Miniconda3-latest-Linux-x86_64.sh
```
Follow the instructions, press `ENTER`.
You must enter the full path for the installation, like `/home/users/username/tools/miniconda3` where `username` is your login.
And answer `yes` to initialize `conda`, note that your `.bashrc` will be modified, and for this session activated with
```bash
user@access source ~/.bashrc
```
the prompt then has the `(base)` prefix, indicating that the `base` environment is activated by default.
Of note, it will be at each login, you can disable this is you dislike by removing the relevant part in your `.bashrc`
between the line `# >>> conda initialize >>>` and `# <<< conda initialize <<<`
### Install snakemake with mamba
`mamba` is the [recommended](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) tool for managing environments.
Those 2 commands installed `mamba` in the default `base` environment then `snakemake` in a dedicated environment named `snakemake`
```bash
(base) user@access $ conda install -n base -c conda-forge mamba
(base) user@access $ mamba create -c conda-forge -c bioconda -n snakemake snakemake
```
Once successfull, you need to activate the newly created environment, obsered the new name before you prompt:
```bash
(base) user@access $ conda activate snakemake
(snakemake) user@access $ snakemake --version
```
### Install the ChIP-seq template
In the destination folder of your choice, run the following commands:
```bash
VERSION="v0.0.9"
wget -qO- https://git-r3lab.uni.lu/aurelien.ginolhac/snakemake-chip-seq/-/archive/${VERSION}/snakemake-chip-seq-.tar.gz | tar xfz - --strip-components=1
```
this command will download, extract (without the root folder) the following files:
```
CHANGELOG.md
config/
Dockerfile
LICENSE
README.md
resources/
workflow/
```
you may want to delete the `LICENSE`, `Dockerfile`, `CHANGELOG.md` and `README.md` if you wish,
they are not used by `snakemake` for runtime.
#### (Optional) Useful aliases
The following lines can be added to your `.bashrc`.
The 3 first ones are handy shortcuts:
- `dag` is often run to see which steps are to be re-run or not
- `smk` to load necessary on **ULHPC** in interactive sessions
- the `complete` command loads the auto-completion for snakemake
The `PYTHONNOUSERSITE` variable sets to `True` ensures that your installed python packages superseed the default system installed ones.
```bash
alias dag='snakemake --dag | dot -Tpdf > dag.pdf'
alias smk='conda activate snakemake && module load tools/Singularity'
complete -o bashdefault -C snakemake-bash-completion snakemake
# From Sarah Peter
export PYTHONNOUSERSITE=True
```
##### (Optional) Update `snakemake`
Once `(base)` *conda* activated, you can update your `snakemake` version with:
`mamba upgrade -n snakemake snakemake`
## Usage
The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=snakemake-workflows/chipseq).
After booking ressources on the access node, like for example here, 1 hour and 6 cores:
```
snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac" -j 6
si -c 6 -t 1:00:00
```
### Singularity
this is the easy part as the [HPC team](https://hpc.uni.lu/about/team.html) is providing us with available [modules](https://hpc.uni.lu/users/software/)
the command once on a `node` is:
```bash
(base) user@access module load tools/Singularity
```
### Conda activate
```bash
(base) user@access $ conda activate snakemake
```
Of note, the 2 above steps can be replaced by the alias `smk` if you added the alias in your `.bashrc`
### Dry-run
```bash
(snakemake) user@access $ snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac" -j 6 -n
```
Of note the following messages:
```
Workflow defines that rule get_genome is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule get_annotation is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule genome_faidx is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule bwa_index is eligible for caching between workflows (use the --cache argument to enable this).
```
are warning you that
### Activate the cache
To save download and computation times in between workflows
1. Set-up
```
mkdir -p ${SCRATCH}/tmp
export SNAKEMAKE_OUTPUT_CACHE=${SCRATCH}/tmp
```
2. Run command
```
snakemake --use-singularity --singularity-args "-B /scratch/users/aginolhac:/scratch/users/aginolhac" --cache -j 6
```
# Credits
This template is derived from the official [Snakemake-workflow](https://github.com/snakemake-workflows/chipseq) by [Antonie Vietor](https://github.com/AntonieV) and [David Laehnemann](https://github.com/dlaehnemann).
Initially a port of the [Next-Flow ChIP-seq](https://nf-co.re/chipseq) (https://doi.org/10.5281/zenodo.3240506) by [Harshil Patel](https://github.com/drpatelh) [et al.](https://github.com/nf-core/chipseq/graphs/contributors)
## Main changes
- [Singularity](https://sylabs.io/singularity/) using a docker image publish on [Docker hub](https://hub.docker.com/r/ginolhac/snake-chip-seq)
- [AdapterRemoval](https://adapterremoval.readthedocs.io/en/latest/) for trimming (replacement of `cutadapt`)
- add [FastqScreen](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment