# **get_extended_network.R** Script for extending the original list of disease genes using PPI data
Should be run as `Rscript --vanilla get_extended_network.R <input_filename> <config_file>`
`input_filename` should contain list of genes as HGNC symbols, one entry per line. Default is "input.txt" in the script folder.
`config_file` contains the several parameters to perform the expansion of the data. Default is "config.txt" in the script folder.
The script uses "config.txt" file, containing the following parameters
- outputFile: the output file name containing the extended network and some edge attributes (see below).
- n: the number of new genes in the extended network. By default 50.
- score: is the string score (between 0,1). By default 0. It is used to filter a posteriori because
I was not able to find documentation on how to add the score in the API url request
- dbsource: the database source of the protein-protein interaction data. Currently, stringdb or omnipathdb.
By default stringdb. When source = "stringdb"", the script will overlap omnipathdb data to provide information about directionality. When source = "omnipathdb" it will expand the network by retrieven all information for the query genes annotated in omnipath.
The script provides an output file with the pairs gene A - gene B (HGNC identifiers, and also the ncbi entrez identifiers). If available in omnipath, it will contain directionality (values = 1, there is direction. value=0 means no direction ). The column consensus_directionality reflects the fact that some evidence for a pair might be contradictory. It can be ignored by now. The column references contains the publications retrieved from omnipath.