Renaming proteins in FASTA for CD-HIT analysis
Need new strategy for the cdhit
step: new IDs cannot be directly linked to the original IDs from the prodigal
FASTA file.
- keep the original ID
- add prefix with read type and tool name, e.g.
sr_megahit
Advantages:
- link to original ID
- link to assembly tool and type of the input reads
- easy to process to extract required information