Commit 39f03351 authored by Anna Buschart's avatar Anna Buschart
Browse files

links in tree text

parent 380da38e
This workflow makes use of the functional annotation data and the completeness information stored in the [mongo database](mongo-database.md), collects the amino acid sequences of all complete phylogenetic marker gene predictions in all samples and builds a phylogenetic tree based on multiple sequence alignment for each class of marker genes. The information about membership of the marker genes in a contig [cluster](automatic-clustering.md) (binned reconstructed population-level genome) is retained, so closely related reconstructed genomes from different samples can be indentified as such.
In the first step, an R-script is run to retrieve all complete genes annotated as one of the mOTU [marker genes](annotate-phylogenetic-marker-genes.md) - COG0012, COG0016, COG0018, COG0172, COG0215, COG0495, COG0525, COG0533, COG0541, COG0552 - or __rpoB__ (TIGR02013) from the [mongo database](mongo-database.md). It returns a table with the gene IDs, the sample IDs and the contig cluster IDs of the complete marker genes for each sample and class of marker gene. (The script is documented here, but it depends on the specific file structure used in this project, so don't expect it to run anywhere else).
In the first step, an R-script is run to retrieve all complete genes annotated as one of the mOTU [marker genes](annotate-phylogenetic-marker-genes.md) - COG0012, COG0016, COG0018, COG0172, COG0215, COG0495, COG0525, COG0533, COG0541, COG0552 - or __rpoB__ (TIGR02013) from the [mongo database](mongo-database.md). It returns a table with the gene IDs, the sample IDs and the contig cluster IDs of the complete marker genes for each sample and class of marker gene. (The script is documented [here](150816_getPhyloMarkers.R), but it depends on the specific file structure used in this project, so don't expect it to run anywhere else).
```
Rscript 150816_getPhyloMarkers.R
......@@ -55,4 +55,4 @@ for(mark in markers){
This will write a tab-separated table for each group of closely related genes. The tables contain the ID of samples and the ID of the contig clusters (population-level genome) which contain a phylogenetic marker gene from the group of related genes. The files are named by a number that derives from the phylogenetic tree and the marker gene.
Another useful package for phylogenetic tree visualization in R is [geiger](http://www.webpages.uidaho.edu/~lukeh/software.html). Using this and ape, we can visualize a tree for each marker gene and colour it according to the sample of origin. Or we can extract specific parts of the tree, like for example leaves formed by phylogenetic marker genes which belong to contig clusters with weak classification. Both examples are performed in the script _150819_MUST_tree.R_, which contains some nice functions for visualization. The script is however based on the MuSt sample IDs and MuSt file structure and should not be run as is on other data sets.
Another useful package for phylogenetic tree visualization in R is [geiger](http://www.webpages.uidaho.edu/~lukeh/software.html). Using this and ape, we can visualize a tree for each marker gene and colour it according to the sample of origin. Or we can extract specific parts of the tree, like for example leaves formed by phylogenetic marker genes which belong to contig clusters with weak classification. Both examples are performed in the script [`150819_MUST_tree.R`](150819_MUST_tree.R), which contains some nice functions for visualization. The script is however based on the MuSt sample IDs and MuSt file structure and should not be run as is on other data sets.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment