Skip to content
Snippets Groups Projects
README.md 3.71 KiB
Newer Older
Patrick May's avatar
Patrick May committed
# Files to download to this directory:
All files neccessary to run the paralog annotation tool locally can be downloaded from the Zenodo repository [Paralog variant classification and scoring](https://zenodo.org/record/802891#.WTacLcmkJTY)
Patrick May's avatar
Patrick May committed
## Annotation files
```
wget https://zenodo.org/record/802891/files/CCDS_IDS.txt  
wget https://zenodo.org/record/802891/files/CCDS2Ensembl2HGNC.txt  
wget https://zenodo.org/record/802891/files/refSeqEnsCCDS.tsv  
```
Patrick May's avatar
Patrick May committed
## Family definitions and scores per families (para and parasub)
```
wget https://zenodo.org/record/802891/files/paraloggroups.HGNC.CCDS.tsv  
wget https://zenodo.org/record/802891/files/parasubfam.probs.tsv  
wget https://zenodo.org/record/802891/files/parascore.genes.tsv  
wget https://zenodo.org/record/802891/files/parasubscore.genes.tsv  
wget https://zenodo.org/record/802891/files/parascores.families.tsv  
wget https://zenodo.org/record/802891/files/parasubscores.subfamilies.tsv  
```

Patrick May's avatar
Patrick May committed
# Para scores per gene 
Per gene one file with stats: paraScores/<genename>.withStats.txt

````
wget https://zenodo.org/record/802891/files/paraScores.tar.gz  
tar -zxvf paraScores.tar.gz  
```

## Parasub scores per gene
Per gene one file with stats: parasubScores/<genename>.withStats.txt

```
wget https://zenodo.org/record/802891/files/parasubgroups.detailed.tsv  
wget https://zenodo.org/record/802891/files/parasubScores.tar.gz  
tar -zxvf parasubScores.tar.gz  
```

## Para homology scores per gene
Per gene one file with stats: parahomoScores/<genename>.withStats.txt

```
wget https://zenodo.org/record/802891/files/parahomogroups.detailed.tsv  
wget https://zenodo.org/record/802891/files/parahomoScores.tar.gz  
tar -zxvf parahomoScores.tar.gz   
```
Patrick May's avatar
Patrick May committed
After this three directories for the thee different score types and three additional files with genenames for which scores will be available will be present in the ```data``` directory
```
para.geneIds.txt        # Gene names for para score
paraScores              # Directory with gene-specific files with para score

parasub.geneIds.txt     # Gene names for parasub score
parasubScores           # Directory with gene-specific files with parasub score

parahomo.geneIds.txt    # Gene names for parahomo score
parahomoScores          # Directory with gene-specific files with parahomo score
```

# File formats
For each of the three scores (para, parasub, parahomo) the `.geneIds.txt` file gives the genenames for which the according score is available.  

Per gene there are two diffferent files available in the according score directory":

1. `genename`.txt eg. KRIT1.txt 

(parascore per aminoacid given in the format: AA position<tab>AA<tab>para_score)
 ```
1       M       0
2       G       0
3       N       0
4       P       0
..
```

2. `genename`.withStats.txt eg. KRIT1.withStats.txt 

(format:AA position<tab>AA<tab>para_score<tab>score_minus_median<tab>(score-median)/STD<tab>score-mean<tab>(score-mean)/STD)
```                 
# GENE: KRIT1
# TOTALSCORE=2883
# LEN=736
# MEAN(SCORE)=3.92
# STD(SCORE)=4.42
# MEDIAN(SCORE)=0
# SCORE>0=0.47
# MEAN(GREATER>0)=8.28 
# STD(SCORE>0)=2.26
# MEDIAN(SCORE>0)=8
# MAXSCORES=0.14
#POS    AA      SCORE   SCORE-MEDIAN    (SCORE-MEDIAN)/STD      SCORE-MEAN      (SCORE-MEAN)/STD
1       M       0       0       0.00    -3.92   -0.89
2       G       0       0       0.00    -3.92   -0.89
3       N       0       0       0.00    -3.92   -0.89
4       P       0       0       0.00    -3.92   -0.89
5       E       0       0       0.00    -3.92   -0.89
..
82      A       10      10      2.26    6.08    1.38
83      N       10      10      2.26    6.08    1.38
84      Q       8       8       1.81    4.08    0.92
85      G       11      11      2.49    7.08    1.60
86      I       5       5       1.13    1.08    0.24
..
```