README.md 3.57 KB
Newer Older
David Hoksza's avatar
readme    
David Hoksza committed
1
2
# Biohackathon 2019: Rare disease maps

David Hoksza's avatar
David Hoksza committed
3
Documentation and code to [https://r3.pages.uni.lu/biocuration/resources/biohackathon2019/rdmaps/](https://r3.pages.uni.lu/biocuration/resources/biohackathon2019/rdmaps/).
David Hoksza's avatar
readme    
David Hoksza committed
4

David Hoksza's avatar
David Hoksza committed
5
### Dependencies
David Hoksza's avatar
readme    
David Hoksza committed
6

David Hoksza's avatar
David Hoksza committed
7
- Bash
David Hoksza's avatar
readme    
David Hoksza committed
8
9
- Python 3.x
- R
David Hoksza's avatar
David Hoksza committed
10
11
12
13
14
- Java 8
(- Maven 3.6)


##### Python
David Hoksza's avatar
readme    
David Hoksza committed
15

David Hoksza's avatar
David Hoksza committed
16
Python needs packages defined in `dependencies/python_requirements.txt` which can be installed
David Hoksza's avatar
David Hoksza committed
17
18
19
via

```commandline
David Hoksza's avatar
David Hoksza committed
20
pip3 install -r dependencies/python_requirements.txt
David Hoksza's avatar
David Hoksza committed
21
22
```

David Hoksza's avatar
David Hoksza committed
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
##### Java

Maven needs to be installed if you want to compile the `map_generator` from the code. This might not be
necesarry, since we provide a compiled version with the code as well. By default the compilation is not
carried out. This can be changed by setting the value `BUILD_MAP_GENERATOR` in `parameters.sh` to 1. 
The maven can be installed on Ubuntu via apt (`sudo apt install maven`).

##### R

The R scripts in the pipeline should be able to install their dependencies on the first run. However, 
it might happen that elevated privileges will be needed. If that is the case and the pipeline failes
on the R scripts, you can install the dependencies by running:

```bash
sudo Rscript --vanilla dependencies/dependencies.R
``` 

##### System

42
43
44
45
46
47
If the pipeline is run at clean Linux installation you might need to install the following libraries 
(`sudo apt-get install` on Ubuntu) prior to running the code:
- libcurl
- libssl-dev
- libxml2-dev

David Hoksza's avatar
David Hoksza committed
48
## Pipeline execution
49

David Hoksza's avatar
David Hoksza committed
50
51
To execute the pipeline, set the values of parameters in the `parameters.sh`, mainly
the list of [Orphanet](https://www.orpha.net/consor/cgi-bin/index.php?lng=EN) disease numbers.
52
53
54
55
56
57
When the parameters are set, run the pipeline:

```
bash assemble.sh 
```

David Hoksza's avatar
David Hoksza committed
58
59
You can also provide additional parameter, being the parameters file if it differs from `parameters.sh`.

60
61
62
63
64
This will run the pipeline described in the next section and eventually output a
ZIP file with a map which can then be imported in [MINERVA](https://minerva.pages.uni.lu/doc/)
as a disease maps integrating all the found enriched pathways together with
genetic an variant [overlays](https://minerva.pages.uni.lu/doc/user_manual/v14.0/index/#overlays-tab).

David Hoksza's avatar
David Hoksza committed
65
## Pipeline description
David Hoksza's avatar
readme    
David Hoksza committed
66

David Hoksza's avatar
David Hoksza committed
67
The pipeline consists of a bunch of tools which can be divided into three broad categories:
68
69

1. [Retrieval of gene-disease mappging and variants](associations/README.md).
David Hoksza's avatar
David Hoksza committed
70
71
72
73
74
75
76
77
78
79
80
2. [Enrichment](enrichment/README.md)
2. [Map assembly](map_generator/README.md)

The tools are glued together by the `assembly.sh` script resulting in the following pipeline:

1. Obtain gene-disease and variant-disease mapping from DisGeNET. 
2. Obtain gene-disease and variant-disease mapping from OpenTargets.
3. Obtain possibly pathogenic ClinVar variants and genes pertinent to given disease.
4. Compile list of of genes associated with disease from all the input sources.
5. Extend the list of genes by going to other resources such as OmniPath or text mining.
6. Compile list of of variants associated with disease from all the input sources.
David Hoksza's avatar
David Hoksza committed
81
7. Filter out variants with high allele frequency using Ensemble's VEP (Variant Effect Predictor) service.
David Hoksza's avatar
David Hoksza committed
82
83
84
85
86
87
88
89
90
8. Obtain variant information (position, protein-level mapping) and store it for MINERVA genetic variant overlay.
9. From resources such as existing disease maps or WikiPathways obtain enriched pathways 
with respect to the disease-associated genes obtained from previous step.
10. Compile the obtained pathways into a single disease map.
11. Bundle the disease map with genetic and variant overlays into a single archive to be then uploaded to [MINERVA](https://minerva.pages.uni.lu/).  


## License

David Hoksza's avatar
David Hoksza committed
91
GNU Affero General Public License v3.0