Commit b3934134 authored by Valentina Galata's avatar Valentina Galata
Browse files

readme: updated notes; added file for results archive

parent eeac8cad
......@@ -143,7 +143,8 @@ snakemake -s workflow_figures/Snakefile --cores 1 --configfile config/fig.yaml -
Notes about manual analyses done using the generated data.
- `gdb_mmseqs2_uniq_metat.md`: sample GDB, unique genes and their metaT coverage
- `gdb_mmseqs2_highconf.md`: sample GDB, "high-confidence" proteins and protein clusters (assembly intersection and metaT cov.)
- `gdb_rgi_aro3004454_flye.md`, `gdb_rgi_other.md`: sample GDB, RGI examples for diff. between SR/Hy and LR, and metaT coverage
- `gdb_barrnap_metat.md`: sample GDB, metaT cov. of some predicted rRNA genes (barrnap)
\ No newline at end of file
- `gdb_rgi_aro3004454_flye.md`: sample GDB, an RGI example for diff. between SR/Hy and LR, and metaT coverage
- ~~`gdb_mmseqs2_uniq_metat.md`: sample GDB, unique genes and their metaT coverage~~
- ~~`gdb_rgi_other.md`: sample GDB, other RGI examples for diff. between SR/Hy and LR, and metaT coverage~~
- ~~`gdb_barrnap_metat.md`: sample GDB, metaT cov. of some predicted rRNA genes (barrnap)~~
\ No newline at end of file
# About
Results archive created for manuscript submission.
# Archived files
Abbreaviations and names:
- sample names: Zymo, NWC, GDB, Rumen
- SR: short-read (data)
- LR: long-read (data)
- Hy: hybrid (SR+LR)
- metaG: metagenomic data
- metaT: metatranscriptomic data
- only for GDB
- metap: metaproteomic data
- only for GDB
For each sample, the following files were saved (in `<sample>/results/`):
- preprocessing stats:
- SR (metaG/metaT), FastP stats: `preprocc/*/sr/fastp.*`
- SR/LR (metaG/metaT, GDB), bbmap stats: `preproc/*/*/bbmap.*`
- reads QC:
- SR (metaG/meetaT), FastQC: `qc/*/sr/*_fastqc.zip`
- LR, NanoStats: `qc/*/lr/NanoStats.*`
- assembly
- "raw" contigs: `assembly/*/*/ASSEMBLY.fasta`
- polished contigs: `assembly/*/*/ASSEMBLY.POLISHED.fasta` (for LR and Hy only)
- polishing rounds: `assembly/*/*/POLISHING_*/ASSEMBLY.fasta`
- assembly mapping
- BAM files: `mapping/*/*/*/ASSEMBLY.POLISHED.*.bam`
- mapping rate summary stats: `mapping/*/mappability.tsv`
- annotation
- protein prediction w/ Prodigal
- FAA FASTA file: `annotation/prodigal/*/*/proteins.faa`
- summary (gene counts): `annotation/prodigal/summary.gene.counts.tsv`
- summary (summary gene length): `annotation/prodigal/summary.gene.length.tsv`
- AMR prediction w/ RGI (CARD)
- RGI output: `annotation/rgi/*/*/rgi.txt`
- summary: `annotation/rgi/summary.tsv`
- rRNA gene prediction w/ barrnap
- GFF file: `annotation/barrnap/*/*/*.gff`
- summary: `annotation/barrnap/summary.tsv`
- analysis:
- assembly quality stats w/ QUAST
- QUAST report: `analysis/quast/*/*/report.tsv`
- sumamry: `analysis/quast/summary_report.tsv`
- analysis w/ Mash: `analysis/mash/contigs.*` (sketch and distances)
- DIAMOND search in UniProt database
- DIAMOND hits: `analysis/diamond/*.tsv`
- summary: `analysis/diamond/summary.db.tsv`
- protein clustering w/ mmseqs2
- clusters: `analysis/mmseqs2/clusters.tsv`
- summary: `analysis/mmseqs2/summary.tsv`
- extra data/analysis for GDB:
- metaP reports: `metap/20210323/*_Default_Peptide_Report.txt`, `metap/20210323/*_Default_Protein_Report.txt`
- AMR/metaT analysis: `report/amr.tsv`
- "high-confidence" proteins: `report/mmseqs2_highconf.tsv`
- metaT cov. of exclusive proteins: `report/mmseqs2_uniq.tsv`
- metaP-based summary: `report/metap.txt`
......@@ -4,10 +4,6 @@ Notes for those trying to reproduce the results from the publication.
# Setup
Download
- the code archive from XXX and extract it
- GDB data from XXX
Install `conda` and create the main `snakemake` environment (see `README.md`).
Clone `OPERA-MS` repository
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment