Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
David Hoksza
bh19-rare-diseases
Commits
7631ecf7
Commit
7631ecf7
authored
Nov 21, 2019
by
David Hoksza
Browse files
enrichment integrated in the pipeline. updated readme
parent
f59b34b0
Changes
4
Hide whitespace changes
Inline
Side-by-side
README.md
View file @
7631ecf7
...
...
@@ -7,6 +7,30 @@ Code and comments to [https://r3.pages.uni.lu/biocuration/resources/biohackathon
-
Python 3.x
-
R
If the pipeline is run at clean Linux installation you might need to install the following libraries
(
`sudo apt-get install`
on Ubuntu) prior to running the code:
-
libcurl
-
libssl-dev
-
libxml2-dev
### Execution
To execute the pipeline, set the values of parameters in the
`PARAMETERS TO SET`
section of the
`assamble.sh`
script file. The main parameter here is the list of
[
Orphanet
](
https://www.orpha.net/consor/cgi-bin/index.php?lng=EN
)
disease numbers.
When the parameters are set, run the pipeline:
```
bash assemble.sh
```
This will run the pipeline described in the next section and eventually output a
ZIP file with a map which can then be imported in
[
MINERVA
](
https://minerva.pages.uni.lu/doc/
)
as a disease maps integrating all the found enriched pathways together with
genetic an variant
[
overlays
](
https://minerva.pages.uni.lu/doc/user_manual/v14.0/index/#overlays-tab
)
.
### Pipeline
-
[
Retrieval of gene-disease mappging and variants
](
associations/README.md
)
.
\ No newline at end of file
1.
[
Retrieval of gene-disease mappging and variants
](
associations/README.md
)
.
2.
[
Enrichment
](
)
\ No newline at end of file
ass
a
mble.sh
→
ass
e
mble.sh
View file @
7631ecf7
#!/usr/bin/env bash
# ------------------------- PARAMETERS TO SET -------------------------
ORPHANET_IDS
=
"130"
DISGENET_CNT_THRESHOLD
=
50
# ------------------------- PARAMETERS TO SET -------------------------
ASSOCIATIONS_DIR
=
associations/
ASSOCIATIONS_DATA_DIR
=
$ASSOCIATIONS_DIR
/data/
RES_DIR
=
results
ENRICHMENT_DIR
=
enrichment/
ENRICHMENT_CONFIG
=
${
ENRICHMENT_DIR
}
/config.txt
PYTHON_BIN
=
python3
ORPHANET_IDS
=
"130"
#ORPHANET_IDS="33,67046,79327,79321,86309"
ORPHANET_IDS_UNDERSCORE
=
${
ORPHANET_IDS
//,/_
}
DISGENET_CNT_THRESHOLD
=
50
mkdir
$RES_DIR
...
...
@@ -23,7 +28,7 @@ echo "Integration with ClinVar stored in ${genes_variants_out_path}"
genes_line
=
`
cat
${
genes_variants_out_path
}
|
grep
"genes in total"
`
genes_out_path
=
${
genes_variants_out_path
/02-genes_variants/03-genes
}
echo
${
genes_line
#*
:
}
>
${
genes_out_path
}
echo
${
genes_line
#*
:
}
|
sed
's/\,/\n/g'
>
${
genes_out_path
}
echo
"Genes stored in
${
genes_out_path
}
"
minerva_genes_out_path
=
${
RES_DIR
}
/04-minerva-genes-id_
${
ORPHANET_IDS_UNDERSCORE
}
.txt
...
...
@@ -31,13 +36,17 @@ $PYTHON_BIN $ASSOCIATIONS_DIR/minerva_genes.py -f ${genes_out_path} > ${minerva_
var_line
=
`
cat
${
genes_variants_out_path
}
|
grep
"variants in total"
`
variants_out_path
=
${
genes_variants_out_path
/02-genes_variants/03-variants
}
echo
${
var_line
#*
:
}
>
${
variants_out_path
}
echo
${
var_line
#*
:
}
|
sed
's/\,/\n/g'
>
${
variants_out_path
}
echo
"Variants stored in
${
variants_out_path
}
"
minerva_variants_out_path
=
${
RES_DIR
}
/04-minerva-variants-id_
${
ORPHANET_IDS_UNDERSCORE
}
.txt
$PYTHON_BIN
$ASSOCIATIONS_DIR
/minerva_variants.py
-f
${
variants_out_path
}
>
${
minerva_variants_out_path
}
# ------------------------------ 2. Obtain pathways ------------------------------
#R -f path_to_script
echo
"Rscript --vanilla enrichment/enrich_maps.R enrich_maps.R
${
genes_out_path
}
${
ENRICHMENT_CONFIG
}
"
enriched_maps_out_path
=
$RES_DIR
/05-enriched_disease_maps.txt
enriched_paths_out_path
=
$RES_DIR
/05-enriched_pathways.txt
cp
enriched_disease_maps.txt
${
enriched_maps_out_path
}
cp
enriched_pathways.txt
${
enriched_paths_out_path
}
associations/minerva_genes.py
View file @
7631ecf7
...
...
@@ -29,4 +29,4 @@ if __name__ == '__main__':
with
open
(
args
.
file
)
as
f
:
gene_symbols
=
f
.
read
()
print
(
get_minerva_format
(
gene_symbols
.
strip
().
split
(
"
,
"
)))
print
(
get_minerva_format
(
gene_symbols
.
strip
().
split
(
"
\n
"
)))
associations/minerva_variants.py
View file @
7631ecf7
...
...
@@ -168,5 +168,5 @@ if __name__ == '__main__':
with
open
(
args
.
file
)
as
f
:
dbsnp_ids
=
f
.
read
()
db_snps
=
get_dbsnp
(
dbsnp_ids
.
split
(
"
,
"
))
db_snps
=
get_dbsnp
(
dbsnp_ids
.
split
(
"
\n
"
))
print
(
get_minerva_format
(
remove_snps_with_multiple_uniprot_ids
(
db_snps
)))
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment