Commit d92e3649 authored by Anna Buschart's avatar Anna Buschart
Browse files

links in proteome text

parent a23dcc42
...@@ -16,21 +16,21 @@ vcftools --gzvcf DNARNAonContigs.vcf.gz --remove-filtered GOF --remove-filtered ...@@ -16,21 +16,21 @@ vcftools --gzvcf DNARNAonContigs.vcf.gz --remove-filtered GOF --remove-filtered
bgzip -d DNARNAonContigsFiltered.vcf.gz bgzip -d DNARNAonContigsFiltered.vcf.gz
``` ```
For the next step, we changed some code written by Nic Pinel (http://www.eafit.edu.co/docentes-investigadores/Paginas/nicolas-pinel.aspx) for his project with Emilie Muller (http://www.nature.com/ncomms/2014/141126/ncomms6603/full/ncomms6603.html). Beside the .vcf this script uses the .tab output of prodigal (https://github.com/hyattpd/prodigal/releases/) which had been used to call the genes and the assembled contigs (which were formatted into blocks of width 80 using the fasta_formatter of the fastx suit (http://hannonlab.cshl.edu/fastx_toolkit/). For the next step, we changed some code written by Nic Pinel (http://www.eafit.edu.co/docentes-investigadores/Paginas/nicolas-pinel.aspx) for his project with Emilie Muller (http://www.nature.com/ncomms/2014/141126/ncomms6603/full/ncomms6603.html). Beside the .vcf this [script](variant_annotateRepairedTab.pl) uses the .tab output of prodigal (https://github.com/hyattpd/prodigal/releases/) which had been used to call the genes and the assembled contigs (which were formatted into blocks of width 80 using the fasta_formatter of the fastx suit (http://hannonlab.cshl.edu/fastx_toolkit/).
``` ```
perl variant_annotateRepairedTab.pl -v DNARNAonContigsFiltered.vcf -a gene.tab -s contigs.formatted.fa -p perl variant_annotateRepairedTab.pl -v DNARNAonContigsFiltered.vcf -a gene.tab -s contigs.formatted.fa -p
``` ```
This script gives out two fasta files with the nucleotide and amino acid sequences of all predicted proteins, including the original predictions and the variants, as well as a .tab file with the variant gene positions in the prodigal-style. The resulting .tab-file was concatenated with the original tab file. This was used to remove incomplete protein fragments which would not form tryptic peptides. This [script](variant_annotateRepairedTab.pl) gives out two fasta files with the nucleotide and amino acid sequences of all predicted proteins, including the original predictions and the variants, as well as a .tab file with the variant gene positions in the prodigal-style. The resulting .tab-file was concatenated with the original tab file. This was used to [remove](trypsinStartEnd.pl) incomplete protein fragments which would not form tryptic peptides.
``` ```
cat gene.tab contigs.formatted.variants.tab >> genes.variants.tab cat gene.tab contigs.formatted.variants.tab >> genes.variants.tab
perl trypsinStartEnd.pl contigs.formatted.variants.faa genes.variants.tab >> genes.variants.endsRemoved.faa perl trypsinStartEnd.pl contigs.formatted.variants.faa genes.variants.tab >> genes.variants.endsRemoved.faa
``` ```
Before the microbial protein predictions were concatenated with the human proteins, they were renamed to have a unique ID starting with sp|1xxxxxxx|. Before the microbial protein predictions were concatenated with the human proteins, they were [renamed](rename4proteomics.pl) to have a unique ID starting with sp|1xxxxxxx|.
``` ```
perl /home/users/aheintzbuschart/myScripts/rename4proteomics.pl genes.variants.endsRemoved.faa 1 >> genes.variants.endsRemoved.Renamed.faa perl rename4proteomics.pl genes.variants.endsRemoved.faa 1 >> genes.variants.endsRemoved.Renamed.faa
``` ```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment