README.md 2.64 KB
Newer Older
1
2
# Map generator

David Hoksza's avatar
David Hoksza committed
3
This code assemble resources into a single map. And consists of:
4

David Hoksza's avatar
David Hoksza committed
5
6
1. Assembling the input pathways into a SBML-formatted map file.
2. Postprocessing of the map file.
Piotr Gawron's avatar
Piotr Gawron committed
7

David Hoksza's avatar
David Hoksza committed
8
## 1. Assembling the input pathways 
Piotr Gawron's avatar
Piotr Gawron committed
9

David Hoksza's avatar
David Hoksza committed
10
#### Description
Piotr Gawron's avatar
Piotr Gawron committed
11

David Hoksza's avatar
David Hoksza committed
12
13
14
15
The code is based on the [minerva source code](https://git-r3lab.uni.lu/minerva/core/tree/master). It downloads list 
of pathways provided as an input (from wikipathways or minerva instances) and merges them into single file. 
Additionally, data from text mining can be taken as an input and this information is incorporated into the whole map. 
Exported file is SBML file with layout, render and multi packages.
Piotr Gawron's avatar
Piotr Gawron committed
16

17

David Hoksza's avatar
David Hoksza committed
18
#### Compilation
19
20
21
22
23
24
25

```
mvn -DskipTests=true clean install -pl biohackathon -am
```

The compiled runnable file will be located in biohackathon/target/biohackathon-1.0-jar-with-dependencies.jar.

David Hoksza's avatar
David Hoksza committed
26
#### Execution
27
28
29
30
31
32
33
34
35
36
37
38

To get information about parameters just run:

```
java -jar biohackathon/target/biohackathon-1.0-jar-with-dependencies.jar
```

Sample usage:

```
java -jar biohackathon/target/biohackathon-1.0-jar-with-dependencies.jar --enrichr-file biohackathon/data/enrichr_output.txt --minerva-file  biohackathon/data/example_pathway_to_pull.txt --text-mining-file  biohackathon/data/brugada_output_file_omnipath.tsv --output-file output.xml
```
David Hoksza's avatar
David Hoksza committed
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

## 2. Map file post-processing

The resulting file needs to be modified in order to:

- Attach UniProt identifiers to the species.
- Trim values whose length exceed 255 characters.

### UniProt identifiers
Attaching UniProt identifiers is necessary in order for the proteins to be UniProt annotated. The reason is that the genetic
 variant overlay (generated in the previous step of the pipeline) is mapped on the proteins based on both gene name
  and UniProt annotation. So if a variant maps on a protein only based on gene level but not on UniProt level, 
  it won't show in MINERVA.
  
  Usage:
  
  ```bash
python3 utils/implant_annotations.py -m map_file -v minerva_variants_file
```

### Trimming long strings
Since the size of some of the strings in MINERVA (such as names or RDF resources) is limited by 255 characters, 
longer string in the input map file need to be trimmed to meet the criteria. This is, for example, the case with 
some RDF values (such as list of pubmed ids) coming from WikiPathways. In case of RDF resources which contain
multiple values for one identifier (e.g. list of pubmeds in places where there should be only one), the scripts
retains just the first identifier. In the rest of the cases, the string is simply trimmed to 255 characters (this
behaviour is not tested and can lead to invalid values).

```bash
python3 utils/implant_annotations.py -m map_file
```