Newer
Older
# Files to download to this directory:
All files neccessary to run the paralog annotation tool locally can be downloaded from the Zenodo repository [Paralog variant classification and scoring](https://zenodo.org/record/802891#.WTacLcmkJTY)
```
wget https://zenodo.org/record/802891/files/CCDS_IDS.txt
wget https://zenodo.org/record/802891/files/CCDS2Ensembl2HGNC.txt
wget https://zenodo.org/record/802891/files/refSeqEnsCCDS.tsv
```
## Family definitions and scores per families (para and parasub)
```
wget https://zenodo.org/record/802891/files/paraloggroups.HGNC.CCDS.tsv
wget https://zenodo.org/record/802891/files/parasubfam.probs.tsv
wget https://zenodo.org/record/802891/files/parascore.genes.tsv
wget https://zenodo.org/record/802891/files/parasubscore.genes.tsv
wget https://zenodo.org/record/802891/files/parascores.families.tsv
wget https://zenodo.org/record/802891/files/parasubscores.subfamilies.tsv
```
Per gene one file with stats: paraScores/<genename>.withStats.txt
````
wget https://zenodo.org/record/802891/files/paraScores.tar.gz
tar -zxvf paraScores.tar.gz
```
## Parasub scores per gene
Per gene one file with stats: parasubScores/<genename>.withStats.txt
```
wget https://zenodo.org/record/802891/files/parasubgroups.detailed.tsv
wget https://zenodo.org/record/802891/files/parasubScores.tar.gz
tar -zxvf parasubScores.tar.gz
```
## Para homology scores per gene
Per gene one file with stats: parahomoScores/<genename>.withStats.txt
```
wget https://zenodo.org/record/802891/files/parahomogroups.detailed.tsv
wget https://zenodo.org/record/802891/files/parahomoScores.tar.gz
tar -zxvf parahomoScores.tar.gz
```
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
After this three directories for the thee different score types and three additional files with genenames for which scores will be available will be present in the ```data``` directory
```
para.geneIds.txt # Gene names for para score
paraScores # Directory with gene-specific files with para score
parasub.geneIds.txt # Gene names for parasub score
parasubScores # Directory with gene-specific files with parasub score
parahomo.geneIds.txt # Gene names for parahomo score
parahomoScores # Directory with gene-specific files with parahomo score
```
# File formats
For each of the three scores (para, parasub, parahomo) the `.geneIds.txt` file gives the genenames for which the according score is available.
Per gene there are two diffferent files available in the according score directory":
1. `genename`.txt eg. KRIT1.txt
(parascore per aminoacid given in the format: AA position<tab>AA<tab>para_score)
```
1 M 0
2 G 0
3 N 0
4 P 0
..
```
2. `genename`.withStats.txt eg. KRIT1.withStats.txt
(format:AA position<tab>AA<tab>para_score<tab>score_minus_median<tab>(score-median)/STD<tab>score-mean<tab>(score-mean)/STD)
```
# GENE: KRIT1
# TOTALSCORE=2883
# LEN=736
# MEAN(SCORE)=3.92
# STD(SCORE)=4.42
# MEDIAN(SCORE)=0
# SCORE>0=0.47
# MEAN(GREATER>0)=8.28
# STD(SCORE>0)=2.26
# MEDIAN(SCORE>0)=8
# MAXSCORES=0.14
#POS AA SCORE SCORE-MEDIAN (SCORE-MEDIAN)/STD SCORE-MEAN (SCORE-MEAN)/STD
1 M 0 0 0.00 -3.92 -0.89
2 G 0 0 0.00 -3.92 -0.89
3 N 0 0 0.00 -3.92 -0.89
4 P 0 0 0.00 -3.92 -0.89
5 E 0 0 0.00 -3.92 -0.89
..
82 A 10 10 2.26 6.08 1.38
83 N 10 10 2.26 6.08 1.38
84 Q 8 8 1.81 4.08 0.92
85 G 11 11 2.49 7.08 1.60
86 I 5 5 1.13 1.08 0.24
..
```