Commit 7631ecf7 authored by David Hoksza's avatar David Hoksza
Browse files

enrichment integrated in the pipeline. updated readme

parent f59b34b0
......@@ -7,6 +7,30 @@ Code and comments to [https://r3.pages.uni.lu/biocuration/resources/biohackathon
- Python 3.x
- R
If the pipeline is run at clean Linux installation you might need to install the following libraries
(`sudo apt-get install` on Ubuntu) prior to running the code:
- libcurl
- libssl-dev
- libxml2-dev
### Execution
To execute the pipeline, set the values of parameters in the `PARAMETERS TO SET`
section of the `assamble.sh` script file. The main parameter here is the list of
[Orphanet](https://www.orpha.net/consor/cgi-bin/index.php?lng=EN) disease numbers.
When the parameters are set, run the pipeline:
```
bash assemble.sh
```
This will run the pipeline described in the next section and eventually output a
ZIP file with a map which can then be imported in [MINERVA](https://minerva.pages.uni.lu/doc/)
as a disease maps integrating all the found enriched pathways together with
genetic an variant [overlays](https://minerva.pages.uni.lu/doc/user_manual/v14.0/index/#overlays-tab).
### Pipeline
- [Retrieval of gene-disease mappging and variants](associations/README.md).
\ No newline at end of file
1. [Retrieval of gene-disease mappging and variants](associations/README.md).
2. [Enrichment]()
\ No newline at end of file
#!/usr/bin/env bash
# ------------------------- PARAMETERS TO SET -------------------------
ORPHANET_IDS="130"
DISGENET_CNT_THRESHOLD=50
# ------------------------- PARAMETERS TO SET -------------------------
ASSOCIATIONS_DIR=associations/
ASSOCIATIONS_DATA_DIR=$ASSOCIATIONS_DIR/data/
RES_DIR=results
ENRICHMENT_DIR=enrichment/
ENRICHMENT_CONFIG=${ENRICHMENT_DIR}/config.txt
PYTHON_BIN=python3
ORPHANET_IDS="130"
#ORPHANET_IDS="33,67046,79327,79321,86309"
ORPHANET_IDS_UNDERSCORE=${ORPHANET_IDS//,/_}
DISGENET_CNT_THRESHOLD=50
mkdir $RES_DIR
......@@ -23,7 +28,7 @@ echo "Integration with ClinVar stored in ${genes_variants_out_path}"
genes_line=`cat ${genes_variants_out_path} | grep "genes in total"`
genes_out_path=${genes_variants_out_path/02-genes_variants/03-genes}
echo ${genes_line#*:} > ${genes_out_path}
echo ${genes_line#*:} | sed 's/\,/\n/g' > ${genes_out_path}
echo "Genes stored in ${genes_out_path}"
minerva_genes_out_path=${RES_DIR}/04-minerva-genes-id_${ORPHANET_IDS_UNDERSCORE}.txt
......@@ -31,13 +36,17 @@ $PYTHON_BIN $ASSOCIATIONS_DIR/minerva_genes.py -f ${genes_out_path} > ${minerva_
var_line=`cat ${genes_variants_out_path} | grep "variants in total"`
variants_out_path=${genes_variants_out_path/02-genes_variants/03-variants}
echo ${var_line#*:} > ${variants_out_path}
echo ${var_line#*:} | sed 's/\,/\n/g' > ${variants_out_path}
echo "Variants stored in ${variants_out_path}"
minerva_variants_out_path=${RES_DIR}/04-minerva-variants-id_${ORPHANET_IDS_UNDERSCORE}.txt
$PYTHON_BIN $ASSOCIATIONS_DIR/minerva_variants.py -f ${variants_out_path} > ${minerva_variants_out_path}
# ------------------------------ 2. Obtain pathways ------------------------------
#R -f path_to_script
echo "Rscript --vanilla enrichment/enrich_maps.R enrich_maps.R ${genes_out_path} ${ENRICHMENT_CONFIG}"
enriched_maps_out_path=$RES_DIR/05-enriched_disease_maps.txt
enriched_paths_out_path=$RES_DIR/05-enriched_pathways.txt
cp enriched_disease_maps.txt ${enriched_maps_out_path}
cp enriched_pathways.txt ${enriched_paths_out_path}
......@@ -29,4 +29,4 @@ if __name__ == '__main__':
with open(args.file) as f:
gene_symbols = f.read()
print(get_minerva_format(gene_symbols.strip().split(",")))
print(get_minerva_format(gene_symbols.strip().split("\n")))
......@@ -168,5 +168,5 @@ if __name__ == '__main__':
with open(args.file) as f:
dbsnp_ids = f.read()
db_snps = get_dbsnp(dbsnp_ids.split(","))
db_snps = get_dbsnp(dbsnp_ids.split("\n"))
print(get_minerva_format(remove_snps_with_multiple_uniprot_ids(db_snps)))
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment