Commit c29996d9 authored by Anna Buschart's avatar Anna Buschart
Browse files

include script for tree-based

parent 74b0146d
#!/usr/bin/perl
use Bio::DB::Fasta;
my $fastaFile = shift;
my $queryFile = shift; #0: gene, 1: sample, 2: cluster
my $db = Bio::DB::Fasta->new( $fastaFile );
open (IN, $queryFile);
while (<IN>){
chomp;
my @fields = split /\t/;
my $seq = $fields[0];
unless ($seq eq "gene") {
my $sequence = $db->seq($seq);
if (!defined( $sequence )) {
print STDERR "Sequence $seq not found. \n";
next;
}
my $lib = $fields[1];
my $cluster = $fields[2];
print ">$lib","_","$cluster","_","$seq\n", "$sequence\n";
}
}
......@@ -6,7 +6,7 @@ In the first step, an R-script is run to retrieve all complete genes annotated a
Rscript 150816_getPhyloMarkers.R
```
This table is then used as input to a perl script which extracts the genes of interest from the fasta file containing all gene predictions of a sample (names of samples are part of the combinedIds file). We append the output of the script to the same file, so we have one fasta file with all complete genes for each class of marker genes.
This table is then used as input to a perl [script](fastaProteinExtractAddSampleCluster.pl) which extracts the genes of interest from the fasta file containing all gene predictions of a sample (names of samples are part of the combinedIds file). We append the output of the script to the same file, so we have one fasta file with all complete genes for each class of marker genes.
```
for lib in `cut -f 2 combinedIds`
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment