# Map generator This code assemble resources into a single map. And consists of: 1. Assembling the input pathways into a SBML-formatted map file. 2. Postprocessing of the map file. ## 1. Assembling the input pathways #### Description The code is based on the [minerva source code](https://git-r3lab.uni.lu/minerva/core/tree/master). It downloads list of pathways provided as an input (from wikipathways or minerva instances) and merges them into single file. Additionally, data from text mining can be taken as an input and this information is incorporated into the whole map. Exported file is SBML file with layout, render and multi packages. #### Compilation ``` mvn -DskipTests=true clean install -pl biohackathon -am ``` The compiled runnable file will be located in biohackathon/target/biohackathon-1.0-jar-with-dependencies.jar. #### Execution To get information about parameters just run: ``` java -jar biohackathon/target/biohackathon-1.0-jar-with-dependencies.jar ``` Sample usage: ``` java -jar biohackathon/target/biohackathon-1.0-jar-with-dependencies.jar --enrichr-file biohackathon/data/enrichr_output.txt --minerva-file biohackathon/data/example_pathway_to_pull.txt --text-mining-file biohackathon/data/brugada_output_file_omnipath.tsv --output-file output.xml ``` ## 2. Map file post-processing The resulting file needs to be modified in order to: - Attach UniProt identifiers to the species. - Trim values whose length exceed 255 characters. ### UniProt identifiers Attaching UniProt identifiers is necessary in order for the proteins to be UniProt annotated. The reason is that the genetic variant overlay (generated in the previous step of the pipeline) is mapped on the proteins based on both gene name and UniProt annotation. So if a variant maps on a protein only based on gene level but not on UniProt level, it won't show in MINERVA. Usage: ```bash python3 utils/implant_annotations.py -m map_file -v minerva_variants_file ``` ### Trimming long strings Since the size of some of the strings in MINERVA (such as names or RDF resources) is limited by 255 characters, longer string in the input map file need to be trimmed to meet the criteria. This is, for example, the case with some RDF values (such as list of pubmed ids) coming from WikiPathways. In case of RDF resources which contain multiple values for one identifier (e.g. list of pubmeds in places where there should be only one), the scripts retains just the first identifier. In the rest of the cases, the string is simply trimmed to 255 characters (this behaviour is not tested and can lead to invalid values). ```bash python3 utils/implant_annotations.py -m map_file ```