Commit db386fd1 authored by Sarah Peter's avatar Sarah Peter

Make VirtualBox default.

parent c770fae7
......@@ -25,7 +25,7 @@ In this tutorial you will learn how to run a [ChIP-seq](https://en.wikipedia.org
## Install conda
For this tutorial we will use the [`conda` package manager](https://www.anaconda.com/) to install the required tools.
For this tutorial we will use the [`conda` package manager](https://www.anaconda.com/) to install the required tools.
> Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.
>
......@@ -39,58 +39,35 @@ It can encapsulate software and packages in environments, so you can have multip
We will use conda on two levels in this tutorial. First we use a conda environment to install and run snakemake. Second, inside the snakemake workflow we will define separate conda environments for each step.
### Windows
### Set up VirtualBox
You need at least Windows 10 with the [Windows 10 Fall Creator update](https://support.microsoft.com/en-gb/help/4028685/windows-10-get-the-fall-creators-update), released October 2017.
Independent of the operating system you are using, we recommend you set up a VirtualBox as described in our [instructions](virtualbox.md). This ensures that the conda installation does not affect your local environment and all steps in the tutorial work as expected.
* Check that the Windows Subsytem for Linux is enabled: open "Control Panel" -> "Programs and Features" -> "Turn Windows Feature on or off" -> check "Windows Subsystem for Linux".
In case you still wish to install conda natively on your machine **at your own risk**, you can check [these installation instructions](conda_installation.md).
* Use [this guide](https://tutorials.ubuntu.com/tutorial/tutorial-ubuntu-on-windows) to install the [Ubuntu app](https://www.microsoft.com/de-de/p/ubuntu/9nblggh4msv6?activetab=pivot:overviewtab) from the Microsoft Store.
### Install conda in the VirtualBox VM
* Run the Ubuntu app and change the directory to a location that you can access from Windows:
* Start the VM in VirtualBox.
```bash
$ cd /mnt/c/Users/<your_username>/
```
* Then follow the Linux instructions.
For older versions of Windows follow the [instructions to set up VirtualBox](virtualbox.md).
### MacOS
* Download and run the [Miniconda .pkg installer](https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.pkg) in the Python 3.7 version.
* Start the "Terminal" app or [iTerm2](https://www.iterm2.com/).
* If you use something else than `bash` as your shell, you need run
* Start the `Powershell` (Windows) or other terminal application (MacOS, Linux).
```bash
$ source ~/.bash_profile
```
* Connect to the VM with
to activate conda. You can also add the conda initialization to other shells with
```bash
$ miniconda3/bin/conda init <shellname>
```
e.g.
```bash
$ miniconda3/bin/conda init zsh
ssh -p 2222 tutorial@127.0.0.1
```
### Linux
* Password is `lcsblcsb` if you have not changed it.
Start the respective terminal/console app.
Once connected to the VM, follow these steps to install conda:
```bash
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ chmod u+x Miniconda3-latest-Linux-x86_64.sh
$ ./Miniconda3-latest-Linux-x86_64.sh
$ ./Miniconda3-latest-Linux-x86_64.sh
```
You need to specify your installation destination, e.g. `/home/<your_username>/tools/miniconda3`. You must use the **full** path and can**not** use `$HOME/tools/miniconda3`. Answer `yes` to initialize Miniconda3.
You need to specify your installation destination, e.g. `/home/tutorial/tools/miniconda3`. You must use the **full** path and can**not** use `$HOME/tools/miniconda3`. Answer `yes` to initialize Miniconda3.
The installation will modify your `.bashrc` to make conda directly available after each login. To activate the changes now, run
......@@ -100,11 +77,11 @@ $ source ~/.bashrc
<a name="env"></a>
## Setup the environment
## Setup the environment
1. Update conda to the latest version:
```bash
$ conda update conda
$ conda update conda
```
2. Create a new conda environment and activate it:
......@@ -127,17 +104,17 @@ $ source ~/.bashrc
(bioinfo_tutorial) $ conda install -c bioconda -c conda-forge snakemake-minimal
```
<a name="snakemake"></a>
## Create snakemake workflow
> The Snakemake workflow management system is a tool to create **reproducible and scalable** data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.
>
>
> &mdash; <cite>[Snakemake manual](https://snakemake.readthedocs.io/en/stable/index.html)</cite>
Snakemake is a very useful tool if you need to combine multiple steps using different software into a coherent workflow. It comes with many features desired for running workflows, like
Snakemake is a very useful tool if you need to combine multiple steps using different software into a coherent workflow. It comes with many features desired for running workflows, like
* ensuring all input and result files are present
* restarting at a failed step
......@@ -158,7 +135,7 @@ We will set up the following workflow:
To speed up computing time we use source files that only contain sequencing reads that map to chromosome 12. The files for input (control) and H3K4me3 (ChIP) are available on our [WebDAV](https://webdav-r3lab.uni.lu/public/biocore/snakemake_tutorial/) server.
Create a working directory and download the necessary data::
Create a working directory and download the necessary data:
```bash
$ mkdir bioinfo_tutorial
......@@ -195,10 +172,10 @@ Your working directory should have the following layout now (using the `tree` co
### Mapping
> In Snakemake, workflows are specified as Snakefiles. Inspired by GNU Make, a Snakefile contains rules that denote how to create output files from input files. Dependencies between rules are handled implicitly, by matching filenames of input files against output files. Thereby wildcards can be used to write general rules.
>
>
> &mdash; <cite>[Snakemake manual - Writing Workflows](https://snakemake.readthedocs.io/en/stable/snakefiles/writing_snakefiles.html)</cite>
> Most importantly, a rule can consist of a name (the name is optional and can be left out, creating an anonymous rule), input files, output files, and a shell command to generate the output from the input, i.e.
> Most importantly, a rule can consist of a name (the name is optional and can be left out, creating an anonymous rule), input files, output files, and a shell command to generate the output from the input, i.e.
> ```python
> rule NAME:
> input: "path/to/inputfile", "path/to/other/inputfile"
......@@ -237,7 +214,7 @@ rule mapping:
"""
```
Now we need to tell snakemake to use a conda environment with bowtie2 and [samtools](http://www.htslib.org/) inside to run this rule. For this purpose there is a specific `conda` directive that can be added to the rule. It accepts a [yaml](https://yaml.org/spec/1.2/spec.html) file that defines the conda environment.
Now we need to tell snakemake to use a conda environment with bowtie2 and [samtools](http://www.htslib.org/) inside to run this rule. For this purpose there is a specific `conda` directive that can be added to the rule. It accepts a [yaml](https://yaml.org/spec/1.2/spec.html) file that defines the conda environment.
```python
conda: "envs/bowtie2.yaml"
......@@ -348,7 +325,7 @@ After this step your working directory should contain the following files:
### Peak calling
The next step in the workflow is to call peaks with [`MACS2`](https://github.com/taoliu/MACS). This tells us where there is enrichment of the ChIP versus the input (control).
The next step in the workflow is to call peaks with [`MACS2`](https://github.com/taoliu/MACS). This tells us where there is enrichment of the ChIP versus the input (control).
You should always choose the peak caller based on how you expect your enriched regions to look like, e.g. narrow or broad peaks.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment