Commit 302b8adf authored by Valentina Galata's avatar Valentina Galata
Browse files

notes: added notes for gdb/rgi (amr)

parent cb26f265
# About
Notes for sample `GDB` and its RGI (CARD) results.
Based on file `report/amr.tsv` created by `workflow_report`.
# Hit contig_4930:1.0-5516.0_1 (LR, Flye), ARO:3004454
Hit of protein `contig_4930:1.0-5516.0_1` from `GDB`'s LR-assembly constructed with `Flye` to `ARO:3004454`.
## ARO:3004454
[ARO:3004454](https://card.mcmaster.ca/ontology/41665)
```
>gb|AAA23018.1|+|Campylobacter coli chloramphenicol acetyltransferase [Campylobacter coli]
MQFTKIDINNWTRKEYFDHYFGNTPCTYSMTVKLDISKLKKDGKKLYPTLLYGVTTIINRHEEFRTALDENGQVGVFSEMLPCYTVFHKE
TETFSSIWTEFTADYTEFLQNYQKDIDAFGERMGMSAKPNPPENTFPVSMIPWTSFEGFNLNLKKGYDYLLPIFTFGKYYEEGGKYYIPL
SIQVHHAVCDGFHVCRFLDELQDLLNK
>gb|M35190.1|+|309-932|Campylobacter coli chloramphenicol acetyltransferase [Campylobacter coli]
ATGCAATTCACAAAGATTGATATAAATAATTGGACACGAAAAGAGTATTTCGACCACTATTTTGGCAATACGCCCTGCACATATAGTATG
ACGGTAAAACTCGATATTTCTAAGTTGAAAAAGGATGGAAAAAAGTTATACCCAACTCTTTTATATGGAGTTACAACGATCATCAATCGA
CATGAAGAGTTCAGGACCGCATTAGATGAAAACGGACAGGTAGGCGTTTTTTCAGAAATGCTGCCTTGCTACACAGTTTTTCATAAGGAA
ACTGAAACCTTTTCGAGTATTTGGACTGAGTTTACAGCAGACTATACTGAGTTTCTTCAGAACTATCAAAAGGATATAGACGCTTTTGGT
GAACGAATGGGAATGTCCGCAAAGCCTAATCCTCCGGAAAACACTTTCCCTGTTTCTATGATACCGTGGACAAGCTTTGAAGGCTTTAAC
TTAAATCTAAAAAAAGGATATGACTATCTACTGCCGATATTTACGTTTGGGAAGTATTATGAGGAGGGCGGAAAATACTATATTCCCTTA
TCGATTCAAGTGCATCATGCCGTTTGTGACGGCTTTCATGTTTGCCGTTTTTTGGATGAATTACAAGACTTGCTGAATAAATAA
```
## Matched protein
Protein `contig_4930:1.0-5516.0_1` the first protein-coding gene on contig `contig_4930:1.0-5516.0` (LR, `Flye`).
The first two genes from the same contig:
```
>contig_4930:1.0-5516.0_1 # 182 # 448 # 1 # ID=4060_1;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.356
>contig_4930:1.0-5516.0_2 # 1073 # 2317 # 1 # ID=4060_2;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.553
```
For `contig_4930:1.0-5516.0_1`, protein (from `Prodigal`) and gene (extracted from contig) sequences:
```
>contig_4930:1.0-5516.0_1 # 182 # 448 # 1 # ID=4060_1;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.356
MQFTKIDINNWTRKEYFDHYFDNTPCTYSMTVKLDISKLKKDGKKLYPTLLYGVTTILNRHEEFRTALDKNGQVGVFFRNAALLHNLS*
>contig_4930:1.0-5516.0_1 # 182 # 448
ATGCAATTCACAAAGATTGATATAAATAATTGGACACGAAAAGAGTATTTCGACCACTATTTTGACAATACGCCCTGCACATATAGTATGACGGTAAAACTCGATATTTCTAAGTTGAAAAAGGATGGAAAAAAATTATATCCGACTCTCTTATATGGGGTTACAACAATCCTCAATCGACACGAAGAGTTCAGGACGGCATTAGATAAAAACGGACAGGTAGGTGTTTTTTTCAGAAATGCTGCCTTGCTACACAATCTTTCATAA
```
Protein `contig_4930:1.0-5516.0_1` clusters only with itself (based on `mmseqs2` output, created by `Workflow`):
```bash
grep "contig_4930:1\.0-5516\.0_1" clusters.tsv
# flye__contig_4930:1.0-5516.0_1 flye__contig_4930:1.0-5516.0_1
```
## Looking for alternative protein coding sequences
First 1000 bases of the contig `contig_4930:1.0-5516.0`:
```
>contig_4930:1.0-5516.0 1-1000
CATTGGTTGCCGCCCAGCAGACGGCACAGGCTGCCCTAGGCACTTCACGCGGGAACAATACAGCAGCAGCTTACCTCCGGTACTGATCAGAGCGAGGGAAGTGGAGACGAGCCGAAAAAACAGGGATGGTTTAGTCGGTTGTTTCGTGGTTAGATAAATCAGGATTTGCGGAGGATAAATGATGCAATTCACAAAGATTGATATAAATAATTGGACACGAAAAGAGTATTTCGACCACTATTTTGACAATACGCCCTGCACATATAGTATGACGGTAAAACTCGATATTTCTAAGTTGAAAAAGGATGGAAAAAAATTATATCCGACTCTCTTATATGGGGTTACAACAATCCTCAATCGACACGAAGAGTTCAGGACGGCATTAGATAAAAACGGACAGGTAGGTGTTTTTTTCAGAAATGCTGCCTTGCTACACAATCTTTCATAAGGAAACTGAAACCTTTTCGAGTATTTGGACTGAGTTTACAGCAGACTATACTGAGTTTCTTCAGAACTATCAAAAGGATATAGACGCTTATGGTGAACGAAAGGGAATGTTCGCAAAGCCTAATCCTCCGGAAAACACTTTCCCTGTTTCTATGATACCGTGGACAAGCTTTGAAGGCTTTAACTTAAATCTAAAAAAGGGATATGACTATCTACTGCCGATATTTACGTTTGGGAAGTATTATGAGGATGGCGGAAAATACTATATTCCCTTATCGATTCAAGTGCATCATGCCGTTTGTGATGGCTTTCATGTTTGCCGTTTTTTGGATGAATTACAAGACTTGCTGAATAAATAAAATCCAAGTTTGTCGTGGAAACGGGGAGCCGCCCGCAGGCGGCAGGGGCAGCGCCCCTCCGGAGCAGTCCGGCTAATGGCGCATAAGCGTCGAGGCAGGCTGCTCCGGAAACCCTCCCCTTTAGGGGCGCAAAGCACTTTCCTACTGCTGCACCAACGGTGCTGCGGTGGGAAAGTGCTTTTGCGTTACTTTCT
```
extracted using
```bash
grep -A1 ">contig_4930:1.0-5516.0" assembly/lr/flye/ASSEMBLY.POLISHED.fasta | sed -r 's/(.{100})/\1\n/g'
```
Using a translation tool [ExPASy](https://web.expasy.org/translate/), two longest translations with a start and stop codon:
```
> contig_4930:1.0-5516.0_1 (VIRT-56253:5'3' Frame 2, start_pos=60)
MQFTKIDINNWTRKEYFDHYFDNTPCTYSMTVKLDISKLKKDGKKLYPTLLYGVTTILNRHEEFRTALDKNGQVGVFFRNAALLHNLS*
> contig_4930:1.0-5516.0_1NEW (VIRT-7066:5'3' Frame 3, start_pos=139)
MLPCYTIFHKETETFSSIWTEFTADYTEFLQNYQKDIDAYGERKGMFAKPNPPENTFPVSMIPWTSFEGFNLNLKKGYDYLLPIFTFGKYYEDGGKYYIPLSIQVHHAVCDGFHVCRFLDELQDLLNK*
```
The first protein is the same as the protein predicted by `Prodigal`, i.e. it is `contig_4930:1.0-5516.0_1`.
The second protein was **not** predicted by `Prodigal` - we will label it as `contig_4930:1.0-5516.0_1NEW`.
Complete output of `ExPASy` for these two translations is in `lr_flye_contig_4930:1.0-5516.0_1-1000_translated.txt`.
Nucleotide (gene) sequence of the "new" protein `contig_4930:1.0-5516.0_1NEW`:
```
>contig_4930:1.0-5516.0_1NEW # 420 # 806
atgctgccttgctacacaatctttcataaggaaactgaaaccttttcgagtatttggactgagtttacagcagactatactgagtttcttcagaactatcaaaaggatatagacgcttatggtgaacgaaagggaatgttcgcaaagcctaatcctccggaaaacactttccctgtttctatgataccgtggacaagctttgaaggctttaacttaaatctaaaaaagggatatgactatctactgccgatatttacgtttgggaagtattatgaggatggcggaaaatactatattcccttatcgattcaagtgcatcatgccgtttgtgatggctttcatgtttgccgttttttggatgaattacaagacttgctgaataaataa
```
## MSA w/ ARO:3004454
Multiple-sequence alignment using [EMBOSS Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/).
```
VIRT-56253:5'3' MQFTKIDINNWTRKEYFDHYFDNTPCTYSMTVKLDISKLKKDGKKLYPTLLYGVTTILNR 60
gb|AAA23018.1|+|Campylobacter MQFTKIDINNWTRKEYFDHYFGNTPCTYSMTVKLDISKLKKDGKKLYPTLLYGVTTIINR 60
VIRT-7066:5'3' ------------------------------------------------------------ 0
VIRT-56253:5'3' HEEFRTALDKNGQVGVFFRNAALLHNLS*------------------------------- 88
gb|AAA23018.1|+|Campylobacter HEEFRTALDENGQVGVFSEMLPCYTVFHKETETFSSIWTEFTADYTEFLQNYQKDIDAFG 120
VIRT-7066:5'3' -------------------MLPCYTIFHKETETFSSIWTEFTADYTEFLQNYQKDIDAYG 41
VIRT-56253:5'3' ------------------------------------------------------------ 88
gb|AAA23018.1|+|Campylobacter ERMGMSAKPNPPENTFPVSMIPWTSFEGFNLNLKKGYDYLLPIFTFGKYYEEGGKYYIPL 180
VIRT-7066:5'3' ERKGMFAKPNPPENTFPVSMIPWTSFEGFNLNLKKGYDYLLPIFTFGKYYEDGGKYYIPL 101
VIRT-56253:5'3' ---------------------------- 88
gb|AAA23018.1|+|Campylobacter SIQVHHAVCDGFHVCRFLDELQDLLNK- 207
VIRT-7066:5'3' SIQVHHAVCDGFHVCRFLDELQDLLNK* 128
```
```
contig_4930:1.0-5516.0_1 ATGCAATTCACAAAGATTGATATAAATAATTGGACACGAAAAGAGTATTTCGACCACTAT 60
gb|M35190.1|+|309-932|Campylobacter ATGCAATTCACAAAGATTGATATAAATAATTGGACACGAAAAGAGTATTTCGACCACTAT 60
contig_4930:1.0-5516.0_1NEW ------------------------------------------------------------ 0
contig_4930:1.0-5516.0_1 TTTGACAATACGCCCTGCACATATAGTATGACGGTAAAACTCGATATTTCTAAGTTGAAA 120
gb|M35190.1|+|309-932|Campylobacter TTTGGCAATACGCCCTGCACATATAGTATGACGGTAAAACTCGATATTTCTAAGTTGAAA 120
contig_4930:1.0-5516.0_1NEW ------------------------------------------------------------ 0
contig_4930:1.0-5516.0_1 AAGGATGGAAAAAAATTATATCCGACTCTCTTATATGGGGTTACAACAATCCTCAATCGA 180
gb|M35190.1|+|309-932|Campylobacter AAGGATGGAAAAAAGTTATACCCAACTCTTTTATATGGAGTTACAACGATCATCAATCGA 180
contig_4930:1.0-5516.0_1NEW ------------------------------------------------------------ 0
contig_4930:1.0-5516.0_1 CACGAAGAGTTCAGGACGGCATTAGATAAAAACGGACAGGTAGGTGTTTTTTTCAGAAAT 240
gb|M35190.1|+|309-932|Campylobacter CATGAAGAGTTCAGGACCGCATTAGATGAAAACGGACAGGTAGGCGTTTTT-TCAGAAAT 239
contig_4930:1.0-5516.0_1NEW ----------------------------------------------------------at 2
**
contig_4930:1.0-5516.0_1 GCTGCCTTGCTACACAATCTTTCATAA--------------------------------- 267
gb|M35190.1|+|309-932|Campylobacter GCTGCCTTGCTACACAGTTTTTCATAAGGAAACTGAAACCTTTTCGAGTATTTGGACTGA 299
contig_4930:1.0-5516.0_1NEW gctgccttgctacacaatctttcataaggaaactgaaaccttttcgagtatttggactga 62
**************** * ********
contig_4930:1.0-5516.0_1 ------------------------------------------------------------ 267
gb|M35190.1|+|309-932|Campylobacter GTTTACAGCAGACTATACTGAGTTTCTTCAGAACTATCAAAAGGATATAGACGCTTTTGG 359
contig_4930:1.0-5516.0_1NEW gtttacagcagactatactgagtttcttcagaactatcaaaaggatatagacgcttatgg 122
contig_4930:1.0-5516.0_1 ------------------------------------------------------------ 267
gb|M35190.1|+|309-932|Campylobacter TGAACGAATGGGAATGTCCGCAAAGCCTAATCCTCCGGAAAACACTTTCCCTGTTTCTAT 419
contig_4930:1.0-5516.0_1NEW tgaacgaaagggaatgttcgcaaagcctaatcctccggaaaacactttccctgtttctat 182
contig_4930:1.0-5516.0_1 ------------------------------------------------------------ 267
gb|M35190.1|+|309-932|Campylobacter GATACCGTGGACAAGCTTTGAAGGCTTTAACTTAAATCTAAAAAAAGGATATGACTATCT 479
contig_4930:1.0-5516.0_1NEW gataccgtggacaagctttgaaggctttaacttaaatctaaaaaagggatatgactatct 242
contig_4930:1.0-5516.0_1 ------------------------------------------------------------ 267
gb|M35190.1|+|309-932|Campylobacter ACTGCCGATATTTACGTTTGGGAAGTATTATGAGGAGGGCGGAAAATACTATATTCCCTT 539
contig_4930:1.0-5516.0_1NEW actgccgatatttacgtttgggaagtattatgaggatggcggaaaatactatattccctt 302
contig_4930:1.0-5516.0_1 ------------------------------------------------------------ 267
gb|M35190.1|+|309-932|Campylobacter ATCGATTCAAGTGCATCATGCCGTTTGTGACGGCTTTCATGTTTGCCGTTTTTTGGATGA 599
contig_4930:1.0-5516.0_1NEW atcgattcaagtgcatcatgccgtttgtgatggctttcatgtttgccgttttttggatga 362
contig_4930:1.0-5516.0_1 ------------------------- 267
gb|M35190.1|+|309-932|Campylobacter ATTACAAGACTTGCTGAATAAATAA 624
contig_4930:1.0-5516.0_1NEW attacaagacttgctgaataaataa 387
```
Complete alignment output is in `lr_flye_contig_4930:1.0-5516.0_1-1000_translated_alignment.txt`.
A plot for protein sequences created using `BioJS-MSA` was saved to `lr_flye_contig_4930:1.0-5516.0_1-1000_translated_alignment_biojs-msa.png`.
The alignments are almost perfect, with only a few mismatches, and both proteins cover the complete ARO sequence.
`contig_4930:1.0-5516.0_1` and `contig_4930:1.0-5516.0_1NEW` have an overlap of 29 bps: `atgctgccttgctacacaatctttcataa`:
```
atgctgccttgctacacaatctttcataa
.A..A..L..L..H..N..L..S..*.
atgctgccttgctacacaatctttcataa
.M..L..P..C..Y..T..I..F..H.
```
## metaT coverage
Using file `mapping/metat/lr/flye/ASSEMBLY.POLISHED.sr.cov.perbase` created by `workflow`.
MetaT coverage of first 1000 bases of contig `contig_4930:1.0-5516.0` (i.e. including `contig_4930:1.0-5516.0_1` and `contig_4930:1.0-5516.0_1NEW`):
```bash
# grep for contig id and get first 1000 bases
grep -F "contig_4930:1.0-5516.0" mapping/metat/lr/flye/ASSEMBLY.POLISHED.sr.cov.perbase | head -n 1000 | less
```
\ No newline at end of file
# About
Additional analysis focusing on AMR, i.e. RGI (CARD) results, in GDB.
File `amr.tsv`:
- table created using `workflow_report` (2021.01.28)
- all RGI hits to AROs which were found (non-nudged hits) in SR/Hy but **NOT** in LR
- `amr.xlsx`: annotated table
# ARO:3000194 and MMSEQS2 cluster
From `mmseqs2` results: cluster `operamsmetaspades__opera_contig_2775_1` contains:
- operamsmetaspades__opera_contig_2775_1
- metaspades__NODE_8958_length_2718_cov_51.879591_1
- raven__Utg193156:1.0-35862.0_56
- metaspadeshybrid__NODE_230_length_104252_cov_9.790458_35
- metaspadeshybrid__NODE_348_length_69013_cov_11.179239_18
- megahit__k141_88921_2
- operamsmegahit__opera_contig_1207_6
All, except for `raven__Utg193156:1.0-35862.0_56`, are **prefect** hits to `ARO:3000194` (tetW).
RGI hits for `raven__Utg193156:1.0-35862.0_56`: **strict** hit (not nudged), `ARO:3004442` (tet(W/N/W))
Both AROs are very similar: identity 95.1%, similarity 97.5%, no gaps.
Complete output is in `ARO3004442_ARO3000194_alignment.txt` (using (EMBOSS Needle)[https://www.ebi.ac.uk/Tools/psa/emboss_needle/]).
**Conclusions**
- cluster contains only perfect hits for `ARO:3000194`, none of the strict/nudged hits
- outlier/exception: `raven__Utg193156:1.0-35862.0_56` which seem to represent a different, highly similar, ARO
# Hit contig_4930:1.0-5516.0_1 (LR, Flye), ARO:3004454
See in file `notes/gdb_rgi_aro3004454_flye.md`.
# Hit contig_820:1.0-90197.0_60/contig_820:1.0-90197.0_61 (LR, Flye), ARO:3000194
## CARD ARO
[ARO:3000194](https://card.mcmaster.ca/ontology/36333)
```
>gb|CAA10975.1|+|tetW [Butyrivibrio fibrisolvens]
MKIINIGILAHVDAGKTTLTESLLYASGAISEPGSVEKGTTRTDTMFLERQRGITIQAAVTSFQWHRCKVNIVDTPGHMDFLAEVYRSLA
VLDGAILVISAKDGVQAQTRILFHALRKMNIPTVIFINKIDQAGVDLQSVVQSVRDKLSADIIIKQTVSLSPEIVLEENTDIEAWDAVIE
NNDELLEKYIAGEPISREKLAREEQQRVQDASLFPVYHGSAKNGLGIQPLMDAVTGLFQPIGEQGGAALCGSVFKVEYTDCGQRRVYLRL
YSGTLRLRDTVALAGREKLKITEMRIPSKGEIVRTDTAYQGEIVILPSDSVRLNDVLGDQTRLPRKRWREDPLPMLRTTIAPKTAAQRER
LLDALTQLADTDPLLRCEVDSITHEIILSFLGRVQLEVVSALLSEKYKLETVVKEPSVIYMERPLKAASHTIHIEVPPNPFWASIGLSVT
PLSLGSGVQYESRVSLGYLNQSFQNAVRDGIRYGLEQGLFGWNVTDCKICFEYGLYYSPVSTPADFRSLAPIVLEQALKESGTQLLEPYL
SFILYAPQEYLSRAYHDAPKYCATIETAQVKKDEVVFTGEIPARCIQAYRTDLAFYTNGRSVCLTELKGYQAAVGQPVIQPRRPNSRLDK
VRHMFQKVM
```
## Gene/protein
Proteins sequences:
```
>contig_820:1.0-90197.0_60 # 42580 # 43047 # -1 # ID=4927_60;partial=00;start_type=TTG;rbs_motif=AGGAG/GGAGG;rbs_spacer=11-12bp;gc_cont=0.528
MLSEKYKLETVVKEPSVIYMERPLKAASHTIHIEVPPNPFWASIGLSVTPLSLGSGVQYESRVSLGYLNQSFQNAVRDGIRYGLEQGLFGWNVTDCKICFEYGLYYSPVSTPADFRSLAPIVLEQALKESGTQLLEPYLSFIPLCAPGIPFQGLS*
>contig_820:1.0-90197.0_61 # 43382 # 44248 # -1 # ID=4927_61;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.529
MKIINIGILAHVDAGKTTLTESLLYASGAISEPGSVEKGTTRTDTMFLERQRGITIQAAVTSFQWHRCKVNIVDTPGHMDFLAEVYRSLAVLDGAILVISAKDGVQAQTRILFHALRKMNIPTVIFINKIDQAGVDLQSVVQSVRDKLSADIIIKQTVSLSPEIVLEENTDIEAWDAVIENNDELLEKYIAGEPISREKLAREEQQRVQDASLFPVYHGSAKNGLGIQPLMDAVTGLFQPIGEQGAPPHAAAFSRLSTPIAASGVSIYGYTAERCACGIRWPWPGEKS*
```
When translating a part of the contig containing both genes:
- contig_820:1.0-90197.0_60: 3'5' frame 1 (-1 strand)
- contig_820:1.0-90197.0_61: 3'5' frame 3 (-1 strand)
## MSA w/ CARD ARO
[EMBOSS Needle](https://www.ebi.ac.uk/Tools/psa/emboss_needle/)
See file `lr_flye_contig_820:1.0-90197.0_proteins60-61_alignment.txt`.
- contig_820:1.0-90197.0_61 matches the beginning
- contig_820:1.0-90197.0_60 matches the end
## metaT covearge
```bash
grep "contig_820:1.0-90197.0" mapping/metat/lr/flye/ASSEMBLY.POLISHED.sr.cov.perbase | sed -n -e '42000,46000p'
```
## Contig
Bases 42k to 46k:
```bash
grep -A1 "contig_820:1.0-90197.0" assembly/lr/flye/ASSEMBLY.POLISHED.fasta | sed -r 's/(.{100})/\1\n/g' | sed -n -e '420,460p'
```
```
AAACAGCCATAAGAAGTGGTTCAATATCCGTAATATCCATAATTGCCATATTAGAAACTGCTTTCGTAAAGGCTCGATTTCTTTTTAATTCTAATATACT
TTTTCTATCGGTCGCATCCGCCACACAAAATTCAATTTGTTTTGCATATTGTGATTGCCGTCTTTTAGCCAATTCTATCATTTTTTTGCTGTAATCAAAA
GCGACAACCGAAGCGCCTCTTTGTGCAAGATACGAAGAATAATTTCCATTGCCACACGCAATATCCAAAATGTAATCCGCAGGATTAGGAGATAGAAGTT
CCGTTACTTTGGGACGCACTACCTCTCTGTGAAATTCATTAGATTCGTCACCCATTGCATTATCCCAAAATTGTGCGTTCTCCTCCCAGATTTTTTTTAC
TTTCCTCTGTTCCCATGTTCTCTCCCACTCCCCAAATTTGCTTTTTTGCTTCCATTAAATCTTCCTTACTATATTCCATTGTTACCCTCCATAACTTCTG
ATTGTTGCCGTCTTGACGATTATGTATCTTTACATTACCTTCTGAAACATATGGCGCACCTTGTCCAGGCGGCTGTTTGGACGGCGGGGCTGGATGACCG
GCTGACCGACAGCGGCCTGATATCCTTTCAGCTCTGTAAGGCATACGCTCCGCCCGTTGGTGTAAAAGGCCAGATCAGTACGGTATGCCTGTATACAGCG
GGCGGGAATCTCGCCAGTAAAGACAACTTCATCCTTTTTTACCTGGGCCGTTTCGATGGTGGCACAGTATTTCGGTGCATCATGATAAGCCCTGGAAAGG
TATTCCTGGGGCGCATAGAGGGATGAAGGAGAGATAAGGTTCCAGCAGCTGCGTCCCCGATTCCTTCAATGCCTGTTCCAATACAATCGGGGCCAATGAG
CGGAAGTCCGCCGGCGTGCTGACCGGACTGTAATAAAGCCCGTATTCAAAGCAAATCTTACAGTCCGTTACGTTCCAGCCGAACAAGCCCTGCTCCAGCC
CGTAACGGATACCATCCCTGACAGCGTTTTGAAAACTCTGGTTCAAGTATCCCAGCGAAACCCGGCTCTCGTATTGTACACCGGAGCCAAGCGAGAGTGG
TGTAACAGACAGTCCTATGGATGCCCAAAACGGGTTGGGCGGCACCTCGATATGGATGGTGTGGCTGGCTGCTTTGAGCGGCCGCTCCATATAAATGACG
GAGGGTTCCTTTACCACTGTTTCAAGCTTGTATTTTTCCGACAGCAAAGCGGAAACAACCTCCAACTGCACCCGGCCCAAAAAGAAAGAATGATCTCATG
GGTGATGGAATCCACTTCGCAACGCAAAAGCGGGTCAGTATCCGCAAGTTGCGTAAGAGCGTCCAGCAGCCGTTCTCTTTGCGCTGCCGTTTTCGGCGCA
ATCGTCGTCCGCAGCATGGGGAGGGGGTCCTCGCGCCACCTTTTACGAGGGAGCCGGGTTTGGTCCCCTAATACATCGTTTAACCTCACGCTGTCGCTGG
GAAGGATAACAATTTCACCCTGATAAGCGGTGTCTGTCCGAACAATTTCCCCTTTGGATGGAATACGCATCTCTGTGATTTTCAGCTTTTCTCTCCCGGC
CAGGGCCACCGTATCCCGCAGGCGCAGCGTTCCGCTGTATAACCGTAGATAGACACGCCGCTGGCCGCAATCGGTGTACTCAACCTTGAAAACGCTGCCG
CATGGGGCGGCGCCCCCTGTTCCCCAATCGGTTGGAACAGCCCTGTCACCGCATCCATCAACGGTTGAATGCCAAGGCCATTTTTGGCGCTGCCATGATA
GACTGGGAACAGGGAGGCGTCTTGAACCCGCTGCTGTTCCTCCCGCGCAAGTTTTTCCCGGCTGATTGGTTCTCCTGCGATATACTTTTCCAATAATTCA
TCGTTATTTTCGATGACCGCATCCCATGCTTCTATGTCGGTATTTTCCTCCAGGACTATTTCCGGGGACAGCGACACCGTCTGCTTGATGATAATATCGG
CGGAGAGCTTATCCCGAACAGACTGAACCACGCTCTGCAAATCAACGCCAGCCTGGTCGATCTTGTTGATAAAGATAACGGTGGGAATGTTCATTTTCCG
CAGGGCATGGAACAGAATACGGGTCTGGGCCTGCACGCCATCTTTAGCGGAGATCACCAAGATGGCCCCATCTAAAACAGCCAAAGAGCGGTACACCTCC
GCCAAAAAATCCATGTGGCCGGGCGTATCCACAATGTTAACTTTACATCTGTGCCACTGGAAGGAAGTGACTGCCGCTTGAATGGTAATCCCACGCTGCC
GCTCCAAAAACATGGTGTCCGTCCTCGTTGTCCCTTTTTCGACGCTCCCCGGTTCTGAAATGGCTCCGCTGGCATATAGCAGGCTCTCCGTCAAGGTCGT
CTTTCCAGCGTCTACATGGGCAAGAATTCCAATATTGATTATTTTCATGTGATTGTCCTCCCTTTACAGCCCCAAAGGGCATAAAAATCCCCAGCAGTAA
AATACTTTTACCACTGGGGATTATAAGTTGCGGACATACACATATACAGCATACACCTGTTTGTGATTGCTGTTTTTGGGGATATGTCAAAATTGATAAG
GCAAAAGTATTCTTAAATTGGGTACAAAAAACTAAGCCCCTACAAAAGGAGCTATCATAATCCTTTGTTCCCACTATTTGATTATAGTTTTATTTAAGAA
TACCTTGCCGCATATTTTTTACTCCTTTTCTGGATTAAATCATTGTATCACATCAGTTTTAGGAAAGCAAGTACCTAAAAGAAATTTTTCTTCCCCTTAT
ATGTAACAATCATACCGGCTTCCTAGCGTTCAGAATGTTTTCTGCTGTCTGCTGTGGTGTTTGGTTGGAATTGTCCAACCAAAAGCCGATCCGTGGTGTT
GTCTGCATTTTACTAAATACAAATTCAATGTATACAGAAAGATATAAGGAGTGGGAGGGATTCCGCCGTAGTTGGCATTGTAGGAAAATCCAAAAGTTTA
GATTTTCCCACAATGCTTATCTTTTGGTCTTTGGTTCGGAATAGTGTAGTGCTGGCGGTCTATCTCTTGTTTTCGGTTGCTTGCTTCCTTACCGTACATG
AGCATTGGGCCGGGTTTAAACAGATCATTATTATAAATGCTCATTGGGAAAAAAAGGTTTCTAAAAAAAGAGAGAAAAATAAAAAAAAAAAAAGACTTTA
AGAGAAAAAAAAAAAGGGAAAGGGGAAGATCATACCGGAATTTGCCGGATGGAAGTTTGAACATGCTGGTCGGGCCGAAACGGCATTAATGATGTATTAC
CAGAAAACGCTGCTTCCAGGCGTTATTAGCTTGGAAAGCAGGGACTTAACTGTGAATCCTGATATATATACATTTCCTATCCCGCTCGAAATTTCGCAAA
ACGGTGGATCATTAAGTTCATGTCAAGGGGTTACCGACGAAATGGGTGCACAGCTGGCGCAACTGACCGTGAATAAAATTGCCGAGGCGGTAAGGGGGTT
TCTACAGATAAAAACAAATTGATTAAAATACAACATAAAAAGAAAAAGTTAAGGAATGAAAATATAATGCTACATACAAGAGTAAGTGGAGGTTTTATTG
CATGCTACATCATTTCCAGGTAAATACGGTATTGGAGACTTGGGAGAAACGGCAGAGTATACGGAATAGGCAGGGATAAGCTTTATGAGATGACGAGCCT
TGCTAACTGTCCGTTTGTTCTATGGGAAGGCAACCGGAGAATGATTAAACGCAGGATTTTTGATGAGTACATCGAGCAGATGTATTTGATATGATGGAGA
AAATATGGACGAGAAAATTATGGCTTACAAAGTCGAACTCCAAAAGCAAAGTGAAAACACACCGTTTTGGACAAAAAAGATTTTAACCGTTAAAGAAGCT
GCCGAATACACCGGAACCGGAAGGGCAAAAATCCGTCAAATAATTACGCAAGAAAATTGCCCTTTTACTGTGAACAACGGTACACAGATATGTGTGATCC
GGGAAGCGTTCATTGATTATCTCGATAAGCAGCTTCGCATTTTAGCTGCCATTCATTAAGACATACCTTTACAACAAGAATGTACAAAGCGGGGGGGTGA
```
## MMSEQS2
```bash
grep "contig_820:1.0-90197.0_60" clusters.tsv
# flye__contig_2250:1.0-180537.0_144 flye__contig_820:1.0-90197.0_60
grep "contig_2250:1.0-180537.0_144" clusters.tsv
# flye__contig_2250:1.0-180537.0_144 flye__contig_2250:1.0-180537.0_144
# flye__contig_2250:1.0-180537.0_144 flye__contig_820:1.0-90197.0_60
# flye__contig_2250:1.0-180537.0_144 raven__Utg193534:1.0-988445.0_66
grep "contig_820:1.0-90197.0_61" clusters.tsv
# flye__contig_820:1.0-90197.0_61 flye__contig_820:1.0-90197.0_61
```
- 1st in a cluster with other proteins having a nudged tetW hit
- 2nd is unique
# metaP
``` bash
cat ~/tmp/gdb_amr_subset.txt | while read pid; do echo -e "${pid}: $(grep -F \'${pid}\' all_proteins_1_1_Default_Peptide_Report.txt | wc -l)"; done
# megahit__k141_88921_2: 0
# megahit__k141_44684_172: 0
# megahit__k141_37405_1: 0
# megahit__k141_59580_1: 0
# megahit__k141_105836_1: 0
# metaspades__NODE_12_length_289084_cov_116.206895_175: 0
# metaspades__NODE_3961_length_5598_cov_17.248325_7: 0
# metaspades__NODE_8958_length_2718_cov_51.879591_1: 0
# flye__contig_2250:1.0-180537.0_142: 0
# flye__contig_2250:1.0-180537.0_144: 0
# flye__contig_4930:1.0-5516.0_1: 0
# flye__contig_820:1.0-90197.0_60: 0
# flye__contig_820:1.0-90197.0_61: 0
# flye__contig_88:1.0-843327.0_578: 0
# raven__Utg192512:1.0-55869.0_70: 0
# raven__Utg193258:1.0-498612.0_132: 0
# raven__Utg193534:1.0-988445.0_66: 0
# raven__Utg193534:1.0-988445.0_67: 0
# raven__Utg194422:1.0-42550.0_3: 0
# raven__Utg194422:1.0-42550.0_4: 0
# raven__Utg196264:1.0-23463.0_40: 0
# raven__Utg196264:1.0-23463.0_41: 0
# raven__Utg197488:1.0-10276.0_3: 0
# metaspadeshybrid__NODE_1_length_1822148_cov_145.496205_549: 0
# metaspadeshybrid__NODE_230_length_104252_cov_9.790458_35: 0
# metaspadeshybrid__NODE_348_length_69013_cov_11.179239_18: 0
# metaspadeshybrid__NODE_5177_length_5598_cov_17.248325_7: 0
# operamsmegahit__opera_contig_1207_6: 0
# operamsmegahit__opera_contig_1626_1: 0
# operamsmegahit__opera_contig_17221_1: 0
# operamsmegahit__opera_contig_76903_177: 0
# operamsmetaspades__opera_contig_1579_7: 0
# operamsmetaspades__opera_contig_2775_1: 0
# operamsmetaspades__opera_contig_190543_580: 0
```
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment