@@ -11,3 +11,24 @@ The script uses "config.txt" file, containing the list of minerva instances and
It provides output with the list of enriched map areas and pathways.
Statistical tests used for maps and pathways are different.
# **get_extended_network.R** Script for extending the original list of disease genes using PPI data
Should be run as `Rscript --vanilla get_extended_network.R <input_filename> <config_file>`
`input_filename` should contain list of genes as HGNC symbols, one entry per line. Default is "input.txt" in the script folder.
`config_file` contains the several parameters to perform the expansion of the data. Default is "config.txt" in the script folder.
The script uses "config.txt" file, containing the following parameters
- outputFile: the output file name containing the extended network and some edge attributes (see below).
- n: the number of new genes in the extended network. By default 50.
- score: is the string score (between 0,1). By default 0. It is used to filter a posteriori because
I was not able to find documentation on how to add the score in the API url request
- dbsource: the database source of the protein-protein interaction data. Currently, stringdb or omnipathdb.
By default stringdb. When source = "stringdb"", the script will overlap omnipathdb data to provide information about directionality. When source = "omnipathdb" it will expand the network by retrieven all information for the query genes annotated in omnipath.
The script provides an output file with the pairs gene A - gene B (HGNC identifiers, and also the ncbi entrez identifiers). If available in omnipath, it will contain directionality (values = 1, there is direction. value=0 means no direction ). The column consensus_directionality reflects the fact that some evidence for a pair might be contradictory. It can be ignored by now. The column references contains the publications retrieved from omnipath.