Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
ESB
ONT_pilot_gitlab
Commits
b7671b52
Commit
b7671b52
authored
Feb 26, 2021
by
Valentina Galata
Browse files
notes: gdb, mmseqs2, metat cov (issue
#121
)
parent
c51b0cbe
Changes
1
Hide whitespace changes
Inline
Side-by-side
notes/gdb_mmseqs2_uniq_metat.md
0 → 100644
View file @
b7671b52
Notes for genes identified as "unique" using
`mmseqs2`
clustering results.
```
python
# Per assembly, compute the total number of unique genes,
# and number and percentage of unique genes with ave. metaT coverage
# of at least 10x.
import
pandas
df
=
pandas
.
read_csv
(
"/scratch/users/vgalata/gdb/results/report/mmseqs2_uniq.tsv"
,
sep
=
"
\t
"
,
header
=
0
)
for
gr_value
,
gr_df
in
df
.
groupby
(
axis
=
0
,
by
=
[
"tool"
]):
# group by assembly/tool
total
=
gr_df
.
shape
[
0
]
count
=
sum
(
gr_df
[
"ave_cov"
]
>=
10
)
# ave. metaT coverage >= 10x
pct
=
100
*
count
/
total
print
(
"%s: %d, %d, %.2f"
%
(
gr_value
,
total
,
count
,
pct
))
# flye: 51750, 6520, 12.60
# megahit: 16693, 1657, 9.93
# metaspades: 13878, 1141, 8.22
# metaspadeshybrid: 27181, 1946, 7.16
# operamsmegahit: 15624, 1452, 9.29
# operamsmetaspades: 14840, 1193, 8.04
# raven: 35144, 5519, 15.70
```
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment