README_ARCHIVE.md 2.19 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# About

Results archive created for manuscript submission.

# Archived files

Abbreaviations and names:
- sample names: Zymo, NWC, GDB, Rumen
- SR: short-read (data)
- LR: long-read (data)
- Hy: hybrid (SR+LR)
- metaG: metagenomic data
- metaT: metatranscriptomic data
  - only for GDB
- metap: metaproteomic data
  - only for GDB

For each sample, the following files were saved (in `<sample>/results/`):
- preprocessing stats:
20
    - SR (metaG/metaT), FastP stats: `preproc/*/sr/fastp.*`
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
    - SR/LR (metaG/metaT, GDB), bbmap stats: `preproc/*/*/bbmap.*`
- reads QC:
  - SR (metaG/meetaT), FastQC: `qc/*/sr/*_fastqc.zip`
  - LR, NanoStats: `qc/*/lr/NanoStats.*`
- assembly
  - "raw" contigs: `assembly/*/*/ASSEMBLY.fasta`
  - polished contigs: `assembly/*/*/ASSEMBLY.POLISHED.fasta` (for LR and Hy only)
- assembly mapping
  - mapping rate summary stats: `mapping/*/mappability.tsv`
- annotation
  - protein prediction w/ Prodigal
    - FAA FASTA file: `annotation/prodigal/*/*/proteins.faa`
    - summary (gene counts): `annotation/prodigal/summary.gene.counts.tsv`
    - summary (summary gene length): `annotation/prodigal/summary.gene.length.tsv`
  - AMR prediction w/ RGI (CARD)
    - RGI output: `annotation/rgi/*/*/rgi.txt`
    - summary: `annotation/rgi/summary.tsv`
  - rRNA gene prediction w/ barrnap
    - GFF file: `annotation/barrnap/*/*/*.gff`
    - summary: `annotation/barrnap/summary.tsv`
- analysis:
  - assembly quality stats w/ QUAST
    - QUAST report: `analysis/quast/*/*/report.tsv`
    - sumamry: `analysis/quast/summary_report.tsv`
  - analysis w/ Mash: `analysis/mash/contigs.*` (sketch and distances)
  - DIAMOND search in UniProt database
    - DIAMOND hits: `analysis/diamond/*.tsv`
    - summary: `analysis/diamond/summary.db.tsv`
  - protein clustering w/ mmseqs2
    - clusters: `analysis/mmseqs2/clusters.tsv`
    - summary: `analysis/mmseqs2/summary.tsv`
- extra data/analysis for GDB:
  - metaP reports: `metap/20210323/*_Default_Peptide_Report.txt`, `metap/20210323/*_Default_Protein_Report.txt`
  - AMR/metaT analysis: `report/amr.tsv`
  - "high-confidence" proteins: `report/mmseqs2_highconf.tsv`
  - metaT cov. of exclusive proteins: `report/mmseqs2_uniq.tsv`
  - metaP-based summary: `report/metap.txt`