README.md 3.65 KB
Newer Older
David Hoksza's avatar
readme    
David Hoksza committed
1
2
# Biohackathon 2019: Rare disease maps

David Hoksza's avatar
David Hoksza committed
3
Documentation and code to [https://r3.pages.uni.lu/biocuration/resources/biohackathon2019/rdmaps/](https://r3.pages.uni.lu/biocuration/resources/biohackathon2019/rdmaps/).
David Hoksza's avatar
readme    
David Hoksza committed
4

David Hoksza's avatar
David Hoksza committed
5
### Dependencies
David Hoksza's avatar
readme    
David Hoksza committed
6

David Hoksza's avatar
David Hoksza committed
7
- Bash
David Hoksza's avatar
readme    
David Hoksza committed
8
9
- Python 3.x
- R
David Hoksza's avatar
David Hoksza committed
10
11
12
13
14
- Java 8
(- Maven 3.6)


##### Python
David Hoksza's avatar
readme    
David Hoksza committed
15

David Hoksza's avatar
David Hoksza committed
16
Python needs packages defined in `dependencies/python_requirements.txt` which can be installed
David Hoksza's avatar
David Hoksza committed
17
18
19
via

```commandline
David Hoksza's avatar
David Hoksza committed
20
pip3 install -r dependencies/python_requirements.txt
David Hoksza's avatar
David Hoksza committed
21
22
```

David Hoksza's avatar
David Hoksza committed
23
24
##### Java

David Hoksza's avatar
David Hoksza committed
25
26
27
28
29
30
31
If you want to compile the `map_generator` from the code you need Maven (Java package manager) to be installed on your machine.
Maven can be installed on Ubuntu via apt (`sudo apt install maven`). 
However, this should not be necesarry, as the latest compiled version is shipped with the pipeline. 
By default the compilation is not turned on. This can be changed by setting 
the value `BUILD_MAP_GENERATOR` in `parameters.sh` to 1. 


David Hoksza's avatar
David Hoksza committed
32
33
34

##### R

David Hoksza's avatar
David Hoksza committed
35
36
37
The R scripts which are part of the pipeline should be able to install their dependencies on the first run. However, 
it might happen that elevated privileges will be needed. If that is the case and the pipeline fails
on the R scripts execution, you can install the dependencies by running:
David Hoksza's avatar
David Hoksza committed
38
39
40
41
42
43
44

```bash
sudo Rscript --vanilla dependencies/dependencies.R
``` 

##### System

David Hoksza's avatar
David Hoksza committed
45
If the pipeline is run on a clean Linux installation you might need to install the following libraries 
46
47
48
49
50
(`sudo apt-get install` on Ubuntu) prior to running the code:
- libcurl
- libssl-dev
- libxml2-dev

David Hoksza's avatar
David Hoksza committed
51
## Pipeline execution
52

David Hoksza's avatar
David Hoksza committed
53
54
To execute the pipeline, set the values of parameters in the `parameters.sh`, mainly
the list of [Orphanet](https://www.orpha.net/consor/cgi-bin/index.php?lng=EN) disease numbers.
55
56
57
58
59
60
When the parameters are set, run the pipeline:

```
bash assemble.sh 
```

David Hoksza's avatar
David Hoksza committed
61
62
You can also provide additional parameter, being the parameters file if it differs from `parameters.sh`.

63
64
65
66
67
This will run the pipeline described in the next section and eventually output a
ZIP file with a map which can then be imported in [MINERVA](https://minerva.pages.uni.lu/doc/)
as a disease maps integrating all the found enriched pathways together with
genetic an variant [overlays](https://minerva.pages.uni.lu/doc/user_manual/v14.0/index/#overlays-tab).

David Hoksza's avatar
David Hoksza committed
68
## Pipeline description
David Hoksza's avatar
readme    
David Hoksza committed
69

David Hoksza's avatar
David Hoksza committed
70
The pipeline consists of a bunch of tools which can be divided into three broad categories:
71
72

1. [Retrieval of gene-disease mappging and variants](associations/README.md).
David Hoksza's avatar
David Hoksza committed
73
74
75
76
77
78
79
80
81
82
83
2. [Enrichment](enrichment/README.md)
2. [Map assembly](map_generator/README.md)

The tools are glued together by the `assembly.sh` script resulting in the following pipeline:

1. Obtain gene-disease and variant-disease mapping from DisGeNET. 
2. Obtain gene-disease and variant-disease mapping from OpenTargets.
3. Obtain possibly pathogenic ClinVar variants and genes pertinent to given disease.
4. Compile list of of genes associated with disease from all the input sources.
5. Extend the list of genes by going to other resources such as OmniPath or text mining.
6. Compile list of of variants associated with disease from all the input sources.
David Hoksza's avatar
David Hoksza committed
84
7. Filter out variants with high allele frequency using Ensemble's VEP (Variant Effect Predictor) service.
David Hoksza's avatar
David Hoksza committed
85
86
87
88
89
90
91
92
93
8. Obtain variant information (position, protein-level mapping) and store it for MINERVA genetic variant overlay.
9. From resources such as existing disease maps or WikiPathways obtain enriched pathways 
with respect to the disease-associated genes obtained from previous step.
10. Compile the obtained pathways into a single disease map.
11. Bundle the disease map with genetic and variant overlays into a single archive to be then uploaded to [MINERVA](https://minerva.pages.uni.lu/).  


## License

David Hoksza's avatar
David Hoksza committed
94
GNU Affero General Public License v3.0