Commit b7671b52 authored by Valentina Galata's avatar Valentina Galata
Browse files

notes: gdb, mmseqs2, metat cov (issue #121)

parent c51b0cbe
Notes for genes identified as "unique" using `mmseqs2` clustering results.
```python
# Per assembly, compute the total number of unique genes,
# and number and percentage of unique genes with ave. metaT coverage
# of at least 10x.
import pandas
df = pandas.read_csv("/scratch/users/vgalata/gdb/results/report/mmseqs2_uniq.tsv", sep="\t", header=0)
for gr_value, gr_df in df.groupby(axis=0, by=["tool"]): # group by assembly/tool
total = gr_df.shape[0]
count = sum(gr_df["ave_cov"] >= 10) # ave. metaT coverage >= 10x
pct = 100 * count / total
print("%s: %d, %d, %.2f" % (gr_value, total, count, pct))
# flye: 51750, 6520, 12.60
# megahit: 16693, 1657, 9.93
# metaspades: 13878, 1141, 8.22
# metaspadeshybrid: 27181, 1946, 7.16
# operamsmegahit: 15624, 1452, 9.29
# operamsmetaspades: 14840, 1193, 8.04
# raven: 35144, 5519, 15.70
```
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment