Comparing genome and gene reconstruction when using short reads (Illumina), long reads (Oxford Nanopore Technology, ONT)
and a hybrid approach.
Comparing genome and gene reconstruction when using short reads (SR) (Illumina) only, long reads (LR) (Oxford Nanopore Technology) only and a hybrid approach (Hy).
If the path specified above does not exists, you can create the environment from `requirements.yaml`
and replace the env. path in `sbatch.sh` files by `"ONT_pilot"`.
All config files are stored in the folder `config/`: sub-folders contain files for all the samples and/or pipeline steps.
```bash
# will create env. ONT_pilot
conda env create -f requirements.yaml
```
\ No newline at end of file
- samples: `gdb`, `nwc`, `rumen`, `zymo`
- pipeline steps: `rawdata`
*TODO: remove aquifer and gcall*
The sub-folders contain:
- config YAML file(s) (`config(.substep)?.yaml`) for a `Snakemake` workflow
-`slurm` config YAML file(s) (`slurm(.substep)?.yaml`) defining job submission parameters for a `Snakemake` workflow
- bash script(s) to execute a `Snakemake` workflow (`sbatch(.substep)?.sh`)
# Workflows
1. Download public datasets (*TODO: what about GDB?*)
2. Run the FAST5 workflow (per sample)
3. Run the main analysis workflow (per sample)
4. Create reports (per sample)
5. Create figures for the paper
Relevant paremters which have to be changed are listed for each workflow and config file.
Parameters defining system-relevant settings are not listed but should be also be changed if required, e.g. number of threads used by certain tools etc.
## Raw data workflow
Download raw data required for the analysis.
*TODO: remove aquifer*
- config: `config/rawdata/`
-`config.yaml`:
- change `work_dir`
-`sbatch.sh`
- change `SMK_ENV`
- if not using `slurm` to submit jobs remove `--cluster-config`, `--cluster` from the `snakemake` CMD
-`slurm.yaml` (only relevant if using `slurm` for job submission)
- workflow: `workflow_rawdata/`
## FAST5 workflow
Process raw FAST5 files of a sample
- create a multi-FAST5 file from single-FAST5 files
- do basecalling
This step is **not** required if the long-read FASTQ file is already available.