Code and comments to [https://r3.pages.uni.lu/biocuration/resources/biohackathon2019/rdmaps/](https://r3.pages.uni.lu/biocuration/resources/biohackathon2019/rdmaps/).
Documentation and code to [https://r3.pages.uni.lu/biocuration/resources/biohackathon2019/rdmaps/](https://r3.pages.uni.lu/biocuration/resources/biohackathon2019/rdmaps/).
#### Requirements
### Dependencies
- Java Runtime
- Python 3.x
- Python 3.x
- R
- R
- zip utility
- Java 8
(- Maven 3.6)
##### Python
Python needs to have packages decribed in `associations/requirement.txt` which can be installed
Python needs to have packages decribed in `associations/requirement.txt` which can be installed
2. Extend the genes list by looking into additional resources such as OmniPath.
2. Extend the genes list by looking into additional resources such as OmniPath.
3. Use Ensembl to assesses allele frequencies of the identified variants to
3. Use Ensembl to assesses allele frequencies of the identified variants to
filter out possibly non-rare variants.
filter out possibly non-rare variants.
4. Get detailed variant information and compile it into [variant file](https://minerva.pages.uni.lu/doc/user_manual/v14.0/index/#genetic-variant-format) for MINERVA.
## 1. Gene-disease associations and variants
## 1. Gene-disease associations and variants
...
@@ -90,7 +91,7 @@ The OrphaHPO_clinvar_variants_summary is a file obtained from ClinVar FTP on 12t
...
@@ -90,7 +91,7 @@ The OrphaHPO_clinvar_variants_summary is a file obtained from ClinVar FTP on 12t
It was created by filtering ClinVar variants to keep only those having Orphanet identifier,
It was created by filtering ClinVar variants to keep only those having Orphanet identifier,
and reducing its contents to GRCh37 (hg19) variants.
and reducing its contents to GRCh37 (hg19) variants.
The script gets genes and variants for given disease and carries out pairwise comparison
The script gets genes and variants for given disease, excludes non-pathogenic variants and carries out pairwise comparison
of genes and variants for ClinVar and all other input resources which are provided as JSON files (see above). The
of genes and variants for ClinVar and all other input resources which are provided as JSON files (see above). The
output is a report on gene and variation level, including list of all genes and variants pertinent to given
output is a report on gene and variation level, including list of all genes and variants pertinent to given
disease.
disease.
...
@@ -184,6 +185,36 @@ It takes as *input* a file with a single column of rs# and give as *output* in t
...
@@ -184,6 +185,36 @@ It takes as *input* a file with a single column of rs# and give as *output* in t
In the output there are the variants from the input that are either not described in the genomic databses contained in Ensembl or for which there is at least one population with allele frequency >= the threshold indicated in the command line. As an example, using a threshold of 0.10 will filter out form the input file all variants that have minor allele frequency eqaul to or greater than 10\% in at least one population among those described in Ensembl. Also variant not present in Ensembl are filtered out under the assumption that they have never been described so far.
In the output there are the variants from the input that are either not described in the genomic databses contained in Ensembl or for which there is at least one population with allele frequency >= the threshold indicated in the command line. As an example, using a threshold of 0.10 will filter out form the input file all variants that have minor allele frequency eqaul to or greater than 10\% in at least one population among those described in Ensembl. Also variant not present in Ensembl are filtered out under the assumption that they have never been described so far.
###### Comment on efficiency
###### Comment on efficiency
## 3. Obtaining variant information
The script `minerva_variants.py` connects to [http://myvariant.info/](MyVariant.info) API and for each of the provided
variant identifiers (provided as a list of in the [dbSNP Reference numbers](dbSNP Reference SNP) ) gets genetic location
and also protein-level mapping. The protein-level mapping in MyVariant.info comes from the [dbNSFP database](https://sites.google.com/site/jpopgen/dbNSFP).
The resulting output is formatted as a MINERVA's [genetic variant overlay file](https://minerva.pages.uni.lu/doc/user_manual/v14.0/index/#genetic-variant-format). An example follows:
```text
#NAME=DISEASE_ASSOCIATED_VARIANTS
#TYPE=GENETIC_VARIANT
#GENOME_TYPE=UCSC
#GENOME_VERSION=hg19
position original_dna alternative_dna gene_name description color contig allele_frequency amino_acid_change identifier_uniprot
38592933 G A SCN5A snv #ff0000 3 0.8 SCN5A:XXX:g.38592933G>A:p.R1626C H9KVD2
38592729 C T SCN5A snv #ff0000 3 0.8 SCN5A:XXX:g.38592729C>T:p.G1694S H9KVD2
38627199 C T SCN5A snv #ff0000 3 0.8 SCN5A:XXX:g.38627199C>T:p.V924I H9KVD2
38627199 C A SCN5A snv #ff0000 3 0.8 SCN5A:XXX:g.38627199C>A:p.V924F H9KVD2
108867945 T G KCNE5 snv #ff0000 X 0.8 KCNE5:XXX:g.108867945T>G:p.E102A Q9UJ90
81667462 A T CACNA2D1 snv #ff0000 7 0.8 CACNA2D1:XXX:g.81667462A>T:p.N323K P54289-2
This code assemble resources into a single map. And consists of:
## Description
1. Assembling the input pathways into a SBML-formatted map file.
2. Postprocessing of the map file.
The code is based on the [minerva source code](https://git-r3lab.uni.lu/minerva/core/tree/master). It downloads list of pathways provided as an input (from wikipathways or minerva instances) and merges them into single file. Additionally, data from text mining can be taken as an input and this information is incorporated into the whole map. Exported file is SBML file with layout, render and multi packages.
## 1. Assembling the input pathways
## License
#### Description
GNU Affero General Public License v3.0
The code is based on the [minerva source code](https://git-r3lab.uni.lu/minerva/core/tree/master). It downloads list
of pathways provided as an input (from wikipathways or minerva instances) and merges them into single file.
Additionally, data from text mining can be taken as an input and this information is incorporated into the whole map.
Exported file is SBML file with layout, render and multi packages.