Commit d0c44b5b authored by Valentina Galata's avatar Valentina Galata
Browse files

readme: updated (issue #127)

parent eb8dc9d2
......@@ -2,11 +2,9 @@
Comparing genome and gene reconstruction when using short reads (SR) (Illumina) only, long reads (LR) (Oxford Nanopore Technology) only and a hybrid approach (Hy).
# Setup
See `README_REPROD.md` for additional information on how to run the analysis on a different system than the Uni Luxembourg's HPC server.
```bash
git clone --recurse-submodules https://git-r3lab.uni.lu/susheel.busi/ont_pilot_gitlab
```
# Setup
## Conda
......@@ -146,5 +144,6 @@ snakemake -s workflow_figures/Snakefile --cores 1 --configfile config/fig.yaml -
Notes about manual analyses done using the generated data.
- `gdb_mmseqs2_uniq_metat.md`: sample GDB, unique genes and their metaT coverage
- `gdb_mmseqs2_highconf.md`: sample GDB, "high-confidence" proteins and protein clusters (assembly intersection and metaT cov.)
- `gdb_rgi_aro3004454_flye.md`, `gdb_rgi_other.md`: sample GDB, RGI examples for diff. between SR/Hy and LR, and metaT coverage
- `gdb_barrnap_metat.md`: sample GDB, metaT cov. of some predicted rRNA genes (barrnap)
\ No newline at end of file
......@@ -6,7 +6,6 @@ Notes for those trying to reproduce the results from the publication.
Download
- the code archive from XXX and extract it
- DIAMOND DB from XXX and extract it
- GDB data from XXX
Install `conda` and create the main `snakemake` environment (see `README.md`).
......@@ -22,15 +21,23 @@ git clone https://github.com/CSB5/OPERA-MS/tree/c18b4f3c933603a7b35d0ea601a80417
All database files need to be in the same folder (can also be symlinks) defined in all sample config files in `db_dir`.
Some databases will be downloaded/created by the pipeline and some are not required (see below).
The used UniProtKB/TrEMBL database (DIAMOND format) needs to be downloaded and the name of the `*.dmnd` file has to be set in all sample config files in `diamond:db`.
Other database file names/paths can be defined as empty strings or lists
The following database file names/paths can be defined as empty strings or lists in the sample config files:
- `bbmap:rrna_refs` as empty list (`bbmap:host_refs` can be kept unchanged)
- `hmm:kegg` as empty string
- `kraken2:db` as empty string
- `kaiju:db` as empty string
- `GTDBTK:DATA` as empty string
### UniProtKB/TrEMBL database (DIAMOND format)
```bash
# may require a large amount of RAM
wget -q ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz
diamond makedb --in uniprot_trembl.fasta.gz -d uniprot_trembl
```
The name of the created `*.dmnd` file has to be set in all sample config files in `diamond:db`.
## Workflows
The bash scripts to run the `snakemake` pipelines assume that `slurm` is used to submit the jobs.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment