- The threshold score is by default set to 0 (although OpenTargets recommend 0.6 as a reasonable
threshold).
## Ensembl
The script [vepAllInOne.py](bh19-rare-diseases/associations/vepmining) retrives information about (allele frequencies)[https://en.wikipedia.org/wiki/Allele\_frequency] in several populations available in the (Ensembl)[http://www.ensembl.org/index.html] databse. This is done throug (this endpoint)[https://rest.ensembl.org/documentation/info/vep\_id\_post] of the Ensembl API.
```commandline
python3 vepAllInOne.py -f <single column of rsid input file > -t < float, allele frequency threshold below with variant is retained >
```
It takes as *input* a file with a single column of rs# and give as *output* in the standard out a single column of rs#.
In the output there are the variants from the input that are either not described in the genomic databses contained in Ensembl or for which there is at least one population with allele frequency >= the threshold indicated in the command line. As an example, using a threshold of 0.10 will filter out form the input file all variants that have minor allele frequency eqaul to or greater than 10\% in at least one population among those described in Ensembl. Also variant not present in Ensembl are filtered out under the assumption that they have never been described so far.
USAGE: python3 vep.py -t <threshold below with variant is retained> -f <disgenet.test.out>
send a request to https://rest.ensembl.org/documentation/info/vep_id_post to retrive information about allele frequencies in the general population (if available) for a list of rsid
INPUT: an output file form disgenet.py or in general any file that contains rsid in the form
"rs1060501145",
(one or more rsid per line, rsdi must have quotes
USAGE: python3 vepAllInOne.py -f <single column of rsid input file > -t < allele frequency threshold below with variant is retained >
OUTPUT: a single column text file with rsids of variants that are either not described in the genomic databses contained in VEP or for which there is at least one population described in VEP with allele frequency >= the threshold indicated in the command line
INPUT: single column of rsid input file
OUTPUT: in the standard out, a single column of rsids of variants that are either not described in the genomic databses contained in VEP or for which there is at least one population described in VEP with allele frequency >= the threshold indicated in the command line