Commit 74cc58ab authored by St. Elmo's avatar St. Elmo
Browse files

added lots of model construction stuff

parent 5382e0f4
......@@ -5,6 +5,7 @@ makedocs(
authors = "St. Elmo Wilken",
pages = [
"Home" => "index.md",
"Model Structure" => "model_structure.md",
"Model IO" => "io.md",
"Model Construction" => "model_construction.md",
"Optimization Based Analysis" => "basic_analysis.md",
......
# Optimization Based Analysis
## Convenience functions
A selection of standard COBRA functions have been implemented to make basic model analysis more convenient.
### FBA
### pFBA
### Escher integration
## Building your own optimization analysis script
CobraTools.jl makes it simple to access the
\ No newline at end of file
......@@ -4,6 +4,7 @@
## Contents
```@contents
Pages = [
"model_structure.md",
"io.md",
"model_construction.md",
"basic_analysis.md",
......@@ -23,12 +24,8 @@ Some of the features used in this package require external programs to be instal
* The Equilibrator interface requires that the Equilibrator-API has been installed and can be accessed through Julia's PyCall package. Refer to the [Equilibrator-API website](https://gitlab.com/equilibrator/equilibrator-api) for installation instructions. Within Julia, if you can call `pyimport("equilibrator_api")` successfully, then you will be able to use the functions exposed here. To actually use the functions insert `using PyCall` in your main level script (before or after `using CobraTools`).
* To extract turnover numbers, Km, Kcat/Km and Ki from the Brenda database, you will need to download the database as a txt file [available here](https://www.brenda-enzymes.org/download_brenda_without_registration.php) (~250 MB).
The optimization solvers are implemented through `JuMP` and thus this package should be solver agnostic. All tests are conducted using `Gurobi.jl` but other solvers should work.
The optimization solvers are implemented through `JuMP` and thus this package should be solver agnostic. All tests are conducted using `GLPK.jl` but other solvers should work (I use `Gurobi.jl` mostly).
## Quick Example
```@example
a = 1
b = 2
a+b
```
# Model IO
## Model structure
Before reading or writing models, it is important to understand how they are represented internally.
Each model is a type of `CobraTools.Model`, which is composed of a model `id`, arrays of `Reaction`s, `Metabolite`s and `Gene`s, and a gene reaction rules (`grrs`) dictionary.
The fields of `Reaction`, `Metabolite`, `Gene` types are shown below.
When reading or writing, these fields are what is used by `CobraTools.jl`.
```@docs
Model
```
Note, the format of `grrs` in `CobraTools.Model` is a nested array, like [[g1, g2], [g3, g4], ...], indexed by a reaction `id` key.
Each sub-array, e.g. [g1, g2], is composed of essential genes for the reaction to function.
Thus, if rxn1 requires (g1 and g2) or (g3 and g4) to function, then this would be represented by rxn1 => [[g1, g2], [g3, g4]] in `grrs`.
```@docs
Reaction
```
Note, the format of `grr` in `Reaction` is a string like `"(g1 and g2) or (g3 and g4)"`.
This string is parsed into the format used by `grrs` in `CobraTools.Model`, as explained above.
Also note, the metabolites dictionary field of `Reaction` maps a `Metabolite` to its stoichiometrix coefficient.
```@docs
Metabolite
```
```@docs
Gene
```
FIY: `JuMP` also exports a `Model` type, so you need to qualify which `Model` you are referring to when making a new function.
## Reading constraint based models
Currently, SBML, JSON and Matlab formatted models can be imported.
......@@ -63,7 +38,9 @@ model_location = joinpath(pwd(), "e_coli_json_model.json")
# model is a CobraTools.Model object previously imported or created
save_model(model, model_location)
rm(model_location) # hide
```
## Problems
Please let me know when you run into model import/export problems by filing an issue on the repo.
\ No newline at end of file
## IO Problems?
Please let me know when you run into model import/export problems by filing an issue.
\ No newline at end of file
# Model Construction
## Defining genes
Genes are represented by the `Gene` type in `CobraTools`, see [Model Structure](@ref) for details.
Likewise with `Metabolite`s, `Gene`s can be constructed using either an empty constructor or a constructor taking in only
the string `id` of the gene.
```@docs
Gene()
Gene(::String)
```
```@example
using CobraTools # hide
gene = Gene("gene1")
gene.name = "gene 1 name"
gene # pretty printing
```
Helper functions from Base have also been overwritten to make accessing arrays of genes easy.
```@@docs
findfirst(::Array{Gene, 1}, ::String)
getindex(::Array{Gene, 1}, ::Gene)
```
## Defining metabolites
Metabolites are represented by the `Metabolite` type in `CobraTools`, see [Model Structure](@ref) for details.
The simplest way to define a new metabolite is by using the empty constructor `Metabolite()`.
```@docs
Metabolite()
```
Alternatively, `Metabolite(id::String)` can be used to assign only the `id` field of the `Metabolite`.
```@docs
Metabolite(::String)
```
The other fields can be modified as usual, if desired.
```@example
using CobraTools
atp = Metabolite("atp")
atp.name = "Adenosine triphosphate"
atp.formula = "C10H12N5O13P3"
atp.charge = -4 # note that the charge is an Int
atp # pretty printing
```
Basic analysis of `Metabolite`s is also possible. The molecular structure of a metabolite can be inspected by calling
`get_atoms(met::Metabolite)`.
This function is useful for checking atom balances across reactions, or the entire model.
```@docs
CobraTools.get_atoms(met::Metabolite)
```
```@example
using CobraTools # hide
atp = Metabolite("atp") # hide
atp.name = "Adenosine triphosphate" # hide
atp.formula = "C10H12N5O13P3" # hide
get_atoms(atp)
```
Helper functions from Base have also been overwritten to make accessing arrays of metabolites easy.
```@@docs
findfirst(mets::Array{Metabolite, 1}, metid::String)
getindex(mets::Array{Metabolite, 1}, met::Metabolite)
```
## Defining reactions
Reactions are represented by the `Reaction` type in `CobraTools`, see [Model Structure](@ref) for details.
The simplest way to define a new reaction is by using the empty constructor `Reaction()`.
All the other fields still need to be assigned.
```@docs
Reaction()
```
Another option is to use `Reaction(id::String, metabolites::Dict{Metabolite, Float64}, dir="bidir")`, which
assigns the reaction `id`, the reaction stoichiometry (through the metabolite dictionary argument), and the directionality of the reaction.
The remaining fields still need to be assigned, if desired.
```@docs
Reaction(id::String, metabolites::Dict{Metabolite, Float64}, dir="bidir")
```
```@example
using CobraTools # hide
atp = Metabolite("atp") # hide
atp.name = "Adenosine triphosphate" # hide
atp.formula = "C10H12N5O13P3" # hide
atp.charge = -4 # hide
gene = Gene("gene1") # hide
adp = Metabolite("adp") # define another metabolite
metdict = Dict(atp => -1.0, adp => 1.0) # nb stoichiometries need to be floats
rxn = Reaction("dummy rxn", metdict, "for")
rxn.annotation["ec-code"] = ["0.0.0.0"]
rxn.grr = [[gene]] # only gene1 is required for this reaction to work
rxn # pretty printing
```
See the discussion in [Model Structure](@ref) about how to assign `grr` to a reaction.
Yet another way of defining a reaction is through overloading of the operators: `*, +, ∅, ⟶, →, ←, ⟵, ↔, ⟷`.
The longer and shorter arrows mean the same thing, i.e. `⟶` is the same as `→`, etc.
The other fields of the reaction still need to be set directly.
```@example
using CobraTools # hide
atp = Metabolite("atp") # hide
atp.name = "Adenosine triphosphate" # hide
atp.formula = "C10H12N5O13P3" # hide
atp.charge = -4 # hide
adp = Metabolite("adp") # hide
adp.formula = "C10H12N5O10P2" # hide
another_rxn = 2.0adp ⟶ 2.0*atp # forward reaction
another_rxn.id = "another dummy rxn"
another_rxn
```
When building exchange, demand, or sinks reactions the `∅` empty metabolite should be used to indicate that a metabolite is being created or destroyed.
```@example
using CobraTools # hide
adp = Metabolite("adp") # hide
adp.formula = "C10H12N5O10P2" # hide
ex_rxn = ∅ ⟷ adp # exchange reaction
```
It is also possible to check if a reaction is mass balanced by using `is_mass_balanced(rxn::Reaction)`.
Note, this function requires that all the metabolites in the reaction have formulas assigned to them to work properly.
```@docs
is_mass_balanced
```
```@example
using CobraTools # hide
atp = Metabolite("atp") # hide
atp.name = "Adenosine triphosphate" # hide
atp.formula = "C10H12N5O13P3" # hide
atp.charge = -4 # hide
adp = Metabolite("adp") # hide
adp.formula = "C10H12N5O10P2" # hide
unbal_rxn = 2.0adp ⟶ 1.0*atp # unbalanced reaction
is_mass_balanced(unbal_rxn)
```
Helper functions from Base have also been overwritten to make accessing arrays of reactions easy.
```@@docs
findfirst(rxns::Array{Reaction, 1}, rxnid::String)
getindex(rxns::Array{Reaction, 1}, rxn::Reaction)
```
## Building models
Once you have defined some metabolites, genes, and reactions, you can construct a model! This is most simply done by
using the empty model constructor:
```@docs
Model()
```
The fields of `CobraTools.Model` can then be assigned as usual.
```@example
using CobraTools
atp = Metabolite("atp")
adp = Metabolite("adp")
glc = Metabolite("glc")
h2o = Metabolite("h2o")
pp = Metabolite("pp") # pi (phosphate) usually means π, so pp is used here instead
lac = Metabolite("lac")
g1 = Gene("gene1")
g2 = Gene("gene2")
g3 = Gene("gene3")
g4 = Gene("gene4")
catabolism = glc + 2.0*adp + 2.0*pp ⟶ 2.0 * lac + 2.0 * h2o + 2.0 * atp
catabolism.id = "catabolism"
catabolism.grr = [[g1, g2], [g3, g4]]
anabolism = 10.0 * atp ⟶ 10.0*adp
anabolism.id = "anabolism"
glc_ex = ∅ ⟷ glc # exchanges are defined like this so negative fluxes mean import
lac_ex = ∅ ⟷ lac # positive flux means export
mets = [atp, adp, pp, h2o, glc, lac]
genes = [g1, g2, g3, g4]
rxns = [catabolism, anabolism, lac_ex, glc_ex]
model = Model()
model.id = "Test model"
add!(model, mets)
add!(model, rxns)
add!(model, genes)
model # pretty printing
```
Here the `add` functions were used to add the reactions, metabolites and genes to the model.
```@docs
add!(model::CobraTools.Model, rxns::Array{Reaction, 1})
add!(model::CobraTools.Model, mets::Array{Metabolite, 1})
add!(model::CobraTools.Model, genes::Array{Gene, 1})
```
Checking for duplicates can also be done.
Note that duplicate checking is NOT performed when models are imported.
If you suspect the model quality you should check each reaction, metabolite and gene yourself.
```@docs
is_duplicate(model::CobraTools.Model, rxn::Reaction)
is_duplicate(model::CobraTools.Model, cmet::Metabolite)
is_duplicate(model::CobraTools.Model, cgene::Gene)
```
```@example
using CobraTools # hide
atp = Metabolite("atp")
adp = Metabolite("adp")
anabolism = 10.0 * atp ⟶ 10.0*adp
anabolism.id = "anabolism"
anabolism2 = 10.0 * atp ⟶ 10.0*adp
anabolism.id = "anabolism2"
mets = [atp, adp]
rxns = [anabolism]
model = Model()
model.id = "Test model"
add!(model, mets)
add!(model, rxns)
is_duplicate(model, anabolism2)
```
Duplicate reactions, metabolites or genes can be removed using `rm!`.
```@docs
rm!(model::CobraTools.Model, rxns::Union{Array{Reaction, 1}, Reaction})
rm!(model::CobraTools.Model, mets::Union{Array{Metabolite, 1}, Metabolite})
rm!(model::CobraTools.Model, genes::Union{Array{Gene, 1}, Gene})
```
```@example
using CobraTools # hide
atp = Metabolite("atp")
atp2 = Metabolite("atp2")
mets = [atp, atp2]
model = Model()
add!(model, mets)
rm!(model, atp2)
```
A model can also be checked to ensure that no metabolites or genes are missing relative to the reactions.
`fix_model` also ensures that no extra metabolites are present.
```@docs
fix_model!(model::CobraTools.Model)
```
```@example
using CobraTools # hide
atp = Metabolite("atp")
adp = Metabolite("adp")
anabolism = 10.0 * atp ⟶ 10.0*adp
anabolism.id = "anabolism"
mets = [atp]
rxns = [anabolism]
model = Model()
model.id = "Test model"
add!(model, mets) # missing adp
add!(model, rxns)
fix_model!(model) # adp added
```
Helper functions from Base have also been overwritten to make accessing reactions, metabolites and genes easy from a model.
```@@docs
getindex(model::CobraTools.Model, rxn::Reaction)
getindex(model::CobraTools.Model, rxn::Metabolite)
getindex(model::CobraTools.Model, rxn::Gene)
```
\ No newline at end of file
# Model Structure
Before reading or writing models, it is important to understand how they are represented internally.
Each model is a type of `CobraTools.Model`, which is composed of a model `id`, and arrays of `Reaction`s, `Metabolite`s and `Gene`s.
The fields of `Reaction`, `Metabolite`, `Gene` types are shown below.
When reading or writing, these fields are what is used by `CobraTools.jl`.
```@docs
Model
```
```@docs
Reaction
```
Note, the format of `grr` in `Reaction` should be a nested array, like [[g1, g2], [g3, g4], ...].
Each sub-array, e.g. [g1, g2], is composed of essential genes (`g1::CobraTools.Gene`, etc.) for the reaction to function.
Thus, if the reaction requires (g1 and g2) or (g3 and g4) to function, then this would be represented by [[g1, g2], [g3, g4]] in `grr`.
Also note, the metabolites dictionary field of `Reaction` maps a `Metabolite` to its stoichiometrix coefficient.
```@docs
Metabolite
```
```@docs
Gene
```
FIY: `JuMP` also exports a `Model` type, so you need to qualify which `Model` you are referring to when making a new function.
\ No newline at end of file
......@@ -12,14 +12,18 @@ using Statistics
using Random
using PyCall
include("met_tools.jl")
include("rxn_tools.jl")
include("gene_tools.jl")
include("model_tools.jl")
# abstract types
abstract type ModelComponent end # for Reactions, Metabolites and Genes (all have IDs)
include("io_tools.jl")
include("construction_tools.jl")
# definitions of structs
include("metabolite.jl")
include("gene.jl")
include("reaction.jl")
include("model.jl")
# Tools
include("io_tools.jl")
include("construction_overloading.jl")
include("brenda_tools.jl")
include("equilibrator_tools.jl")
# include("basic_analysis.jl")
......@@ -27,10 +31,15 @@ include("equilibrator_tools.jl")
# export
= Metabolite("∅") # for exchange reactions
export , , , , , ,
export Reaction, Metabolite, Gene, Model
export
export Model, add!, rm!, is_duplicate, fix_model! # from model
export Gene # from gene
export Metabolite, get_atoms # from metabolite
export Reaction, is_mass_balanced # from reaction
export , , , , , # from construction_tools
export read_model, save_model # from io_tools
export build_cbm, fba, pfba, map_fluxes, set_bound, exchange_reactions, metabolite_fluxes
export read_model, save_model
# Initialization functions
......
"""
S, b, upper_bounds, lower_bounds = get_core_model(model::CobraTools.Model)
Return stoichiometrix matrix (S), mass balance right hand side (b), upper and lower bounds of constraint based model.
That is, S*v=b with lower_bounds ≤ v ≤ upper_bounds where v is the flux vector. This is useful if you want to construct
your own optimization problem.
"""
function get_core_model(model::CobraTools.Model)
ubs = [rxn.ub for rxn in model.reactions]
lbs = [rxn.lb for rxn in model.reactions]
b = spzeros(length(model.metabolites))
S = spzeros(length(model.metabolites), length(model.reactions))
metids = [met.id for met in model.metabolites] # need indices for S matrix construction
for (i, rxn) in enumerate(model.reactions) # column
for (met, coeff) in rxn.metabolites
j = findfirst(x -> x == met.id, metids) # row
isnothing(j) ? (@error "S matrix construction error: $(met.id) not defined."; continue) : nothing
S[j, i] = coeff
end
end
return S, b, ubs, lbs
end
"""
build_cbm(model::CobraTools.Model)
......@@ -169,23 +194,23 @@ function pfba(model::CobraTools.Model, objective_rxns::Union{Reaction, Array{Rea
end
"""
rxn_flux_dict = map_fluxes(v, model::CobraTools.Model)
map_fluxes(v, model::CobraTools.Model)
Map fluxes from an optimization problem (v) to rxns in a model. v can be a JuMP object (fluxes) or an array of Float64 fluxes.
Assumes they are in order, which they should be since they are constructed from model.
"""
function map_fluxes(v::Array{VariableRef,1}, model::CobraTools.Model)
rxndict = Dict{String, Float64}()
for i in eachindex(model.rxns)
rxndict[model.rxns[i].id] = value(v[i])
for i in eachindex(model.reactions)
rxndict[model.reactions[i].id] = value(v[i])
end
return rxndict
end
function map_fluxes(v::Array{Float64,1}, model::CobraTools.Model)
rxndict = Dict{String, Float64}()
for i in eachindex(model.rxns)
rxndict[model.rxns[i].id] = v[i]
for i in eachindex(model.reactions)
rxndict[model.reactions[i].id] = v[i]
end
return rxndict
end
......
......@@ -56,7 +56,7 @@ Forward reaction.
"""
function(substrates::Union{Metabolite, MetaboliteWithCoefficient, Array{MetaboliteWithCoefficient, 1}}, products::Union{Metabolite, MetaboliteWithCoefficient, Array{MetaboliteWithCoefficient, 1}})
metdict = mkrxn(substrates, products)
return Reaction(metdict, "for")
return Reaction("", metdict, "for")
end
const =
......@@ -65,7 +65,7 @@ Reverse only reaction.
"""
function(substrates::Union{Metabolite, MetaboliteWithCoefficient, Array{MetaboliteWithCoefficient, 1}}, products::Union{Metabolite, MetaboliteWithCoefficient, Array{MetaboliteWithCoefficient, 1}})
metdict = mkrxn(substrates, products)
return Reaction(metdict, "rev")
return Reaction("", metdict, "rev")
end
const =
......@@ -74,6 +74,6 @@ Bidirectional (reversible) reaction.
"""
function(substrates::Union{Metabolite, MetaboliteWithCoefficient, Array{MetaboliteWithCoefficient, 1}}, products::Union{Metabolite, MetaboliteWithCoefficient, Array{MetaboliteWithCoefficient, 1}})
metdict = mkrxn(substrates, products)
return Reaction(metdict, "bidir")
return Reaction("", metdict, "bidir")
end
const =
\ No newline at end of file
......@@ -9,7 +9,7 @@ notes :: Dict{String, Array{String, 1}}
annotation :: Dict{String, Union{Array{String, 1}, String}}
````
"""
mutable struct Gene
mutable struct Gene <: ModelComponent
id :: String
name :: String
notes :: Dict{String, Array{String, 1}}
......@@ -17,7 +17,7 @@ mutable struct Gene
end
"""
gene = Gene()
Gene()
Empty gene constructor.
"""
......@@ -30,43 +30,22 @@ function Gene()
end
"""
gene = Gene(gene_dict :: Dict{String, Any},)
Gene(id::String)
Assign a gene based on the fields contained in gene_dict.
Assign gene with only an id.
"""
function Gene(d)
id = ""
function Gene(id::String)
name = ""
notes = Dict{String, Array{String, 1}}()
annotation = Dict{String, Union{Array{String, 1}, String}}()
for (k, v) in d
if k == "id"
id = v
elseif k == "name"
name = v
elseif k == "notes"
notes = Dict{String, Array{String, 1}}(kk=>vv for (kk, vv) in v)
elseif k == "annotation"
annotation = Dict{String, Union{Array{String, 1}, String}}()
for (kk, vv) in v
if typeof(vv) == String
annotation[kk] = vv
else
annotation[kk] = convert(Array{String, 1}, vv)
end
end
else
@warn "Unrecognized reaction field: $k"
end
end
Gene(id, name, notes, annotation)
end
"""
index = getindex(genes::Array{Gene, 1}, gene::Gene)
getindex(genes::Array{Gene, 1}, gene::Gene)
Get the index of a gene in an array of genes. Return -1 if not found.
Typically used `genes[gene] = index`.
"""
function Base.getindex(genes::Array{Gene, 1}, gene::Gene)
for i in eachindex(genes)
......@@ -78,7 +57,7 @@ function Base.getindex(genes::Array{Gene, 1}, gene::Gene)
end
"""
findfirst(genes::Array{Gene, 1}, geneid::String)
findfirst(genes::Array{Gene, 1}, geneid::String)
Return the gene with geneid or else `nothing`. Typically used: findfirst(model.genes, geneid)
"""
......@@ -91,10 +70,36 @@ function Base.findfirst(genes::Array{Gene, 1}, geneid::String)
return nothing
end
"""
_is_duplicate(genes::Array{Gene, 1}, gene::Gene)
Check if gene already exists in genes but has another id.
First check if the ids are the same.
If not, check if any of the annotations are the same.
"""
function _is_duplicate(genes::Array{Gene, 1}, cgene::Gene)
for gene in genes
if gene.id != cgene.id # not same id
if any([x in get(cgene.annotation, "ncbigene", ["c1"]) for x in get(gene.annotation, "ncbigene", ["c2"])]) ||
any([x in get(cgene.annotation, "ncbigi", ["c1"]) for x in get(gene.annotation, "ncbigi", ["c2"])]) ||
any([x in get(cgene.annotation, "refseq_locus_tag", ["c1"]) for x in get(gene.annotation, "refseq_locus_tag", ["c2"])]) ||
any([x in get(cgene.annotation, "refseq_name", ["c1"]) for x in get(gene.annotation, "refseq_name", ["c2"])]) ||
any([x in get(cgene.annotation, "refseq_synonym", ["c1"]) for x in get(gene.annotation, "refseq_synonym", ["c2"])]) ||
any([x in get(cgene.annotation, "uniprot", ["c1"]) for x in get(gene.annotation, "uniprot", ["c2"])])
return true, genes[gene]
end
else
return true, genes[gene]
end
end
return false, -1
end
"""
Pretty printing of gene::Gene.
"""
function Base.show(io::IO, g::Gene)
function Base.show(io::IO, ::MIME"text/plain", g::Gene)
println(io, "Gene ID: ", g.id)
println(io, "Gene name: ", g.name)
end
......@@ -106,3 +111,13 @@ function Base.show(io::IO, ::MIME"text/plain", gs::Array{Gene, 1})
println(io, "Gene set of length: ", length(gs))
end
"""
Pretty printing of grr::Array{Array{Gene,1},1}.
"""
function Base.show(io::IO, ::MIME"text/plain", grr::Array{Array{Gene,1},1})
grr_strings = String[]